POV-Ray : Newsgroups : povray.off-topic : Linux directory usage question Server Time
5 Sep 2024 03:20:33 EDT (-0400)
  Linux directory usage question (Message 1 to 10 of 26)  
Goto Latest 10 Messages Next 10 Messages >>>
From: TC
Subject: Linux directory usage question
Date: 15 Sep 2009 18:44:46
Message: <4ab018de$1@news.povray.org>
To all Linux gurus here: can anybody tell me how many files can be stored in 
a Linux directory without performance degradation? Or is there no limit for 
directory entries on Linux file systems?



On a web server I have about a thousand product shots in a single directory 
(web shop demands it) - how many more before Linux stops to like me? ;-)


Post a reply to this message

From: Darren New
Subject: Re: Linux directory usage question
Date: 15 Sep 2009 19:40:47
Message: <4ab025ff$1@news.povray.org>
TC wrote:
> On a web server I have about a thousand product shots in a single directory 
> (web shop demands it) - how many more before Linux stops to like me? ;-)

That depends more on the file system than the kernel. If you're using ext2 
or ext3 (I think both have it) there's a flag you can turn on that says to 
build extra indexes for directories. Otherwise, I think they're linear.

-- 
   Darren New, San Diego CA, USA (PST)
   I ordered stamps from Zazzle that read "Place Stamp Here".


Post a reply to this message

From: TC
Subject: Re: Linux directory usage question
Date: 15 Sep 2009 20:32:12
Message: <4ab0320c@news.povray.org>
Well, I use ext2. If I understand ext2 correctly, you may have 32k 
subdirectories before linux throws the towel. Would it crash? Or be graceful 
about it? I surely will not try it out...
However, I cannot find anything about the number of files I can store in a 
single directory. I assume that this number is limited by diskspace only.

But it is no good to assume. And since I know no linux guru but am pretty 
sure here are quite a lot of them to be found, I asked the question. I would 
hate to delve through tons of technical documentation.

I did not know about the indexing flag, though, thank you. Maybe I'll find 
more on it.

It's always a surprise what can or cannot be done with or to a filesystem if 
you take a closer look. I really hate the ADS on Windows NTFS, for 
instance - I find this an abomination. ;-)

"Darren New" <dne### [at] sanrrcom> schrieb im Newsbeitrag 
news:4ab025ff$1@news.povray.org...
> TC wrote:
>> On a web server I have about a thousand product shots in a single 
>> directory (web shop demands it) - how many more before Linux stops to 
>> like me? ;-)
>
> That depends more on the file system than the kernel. If you're using ext2 
> or ext3 (I think both have it) there's a flag you can turn on that says to 
> build extra indexes for directories. Otherwise, I think they're linear.
>
> -- 
>   Darren New, San Diego CA, USA (PST)
>   I ordered stamps from Zazzle that read "Place Stamp Here".


Post a reply to this message

From: Darren New
Subject: Re: Linux directory usage question
Date: 15 Sep 2009 21:24:34
Message: <4ab03e52$1@news.povray.org>
TC wrote:
> Well, I use ext2. If I understand ext2 correctly, you may have 32k 
> subdirectories before linux throws the towel. 

I never heard of that. It's perhaps an artifact of how many links you can 
have to one file, since ".." in a subdirectory links back to the parent 
directory.

> Would it crash? Or be graceful 
> about it? I surely will not try it out...

Now you have me curious enough to try it. ;-)

> However, I cannot find anything about the number of files I can store in a 
> single directory. I assume that this number is limited by diskspace only.

Most likely, since I think ext is in many ways similar to the original v7 
directory layout that BSD replaces the API for.

> But it is no good to assume. And since I know no linux guru but am pretty 
> sure here are quite a lot of them to be found, I asked the question. I would 
> hate to delve through tons of technical documentation.

Every time I've been foolish enough to build a system like that, I've put a 
layer of subdirectories between, so that file abcdefghijk.txt would be 
stored in /stuff/abc/def/ghi/jk.txt or some such.

> I did not know about the indexing flag, though, thank you. Maybe I'll find 
> more on it.

I found it thru yast, but I'm sure you can turn it on and off with ext2tune 
or whatever it's called. Apparently, "-O dir_index" passed to mkfs.ext3 will 
do the trick, but I think you can turn it on with an fsck as well.

> It's always a surprise what can or cannot be done with or to a filesystem if 
> you take a closer look. I really hate the ADS on Windows NTFS, for 
> instance - I find this an abomination. ;-)

Whyfor? Every file system nowadays has something like this, including Linux. 
And it's pretty useful and a logical extension of how NTFS organizes files 
anyway. What don't you like about it?

-- 
   Darren New, San Diego CA, USA (PST)
   I ordered stamps from Zazzle that read "Place Stamp Here".


Post a reply to this message

From: Orchid XP v8
Subject: Re: Linux directory usage question
Date: 16 Sep 2009 03:44:47
Message: <4ab0976f$2@news.povray.org>
TC wrote:
> To all Linux gurus here: can anybody tell me how many files can be stored in 
> a Linux directory without performance degradation? Or is there no limit for 
> directory entries on Linux file systems?

Now I have a vague recollection of somebody at uni telling us that ext2 
works by storing 4096 file entries in an inode. Once you have more files 
than that, the inode stores a pointer to another inode. This inode 
contains not file pointers, but inode pointers. So at this level of 
indirection, you can have up to 4096 * 4096 files. I vaguely recall that 
if you exhaust this, it goes to a third level of indirection. But I 
can't remember whether it stops there.

Of course, given the source this information came from, it could be 
completely bogus. ;-)

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*


Post a reply to this message

From: clipka
Subject: Re: Linux directory usage question
Date: 16 Sep 2009 06:27:22
Message: <4ab0bd8a$1@news.povray.org>
TC schrieb:
> Well, I use ext2. If I understand ext2 correctly, you may have 32k 
> subdirectories before linux throws the towel. Would it crash? Or be graceful 
> about it? I surely will not try it out...
> However, I cannot find anything about the number of files I can store in a 
> single directory. I assume that this number is limited by diskspace only.

WIYF (http://en.wikipedia.org/wiki/Ext2#File_system_limits):

"If the number of files in a directory exceeds 10000 to 15000 files, the 
user will normally be warned that operations can last for a long time 
unless directory indexing is enabled. The theoretical limit on the 

relevant for practical situations."

You may also want to check out info on the tune2fs utility, which I'd 
expect the thing to use if you wanted to enable directory indexing. But 
of course you'd need full control over the file system for that, so if 
your shop is hosted on a shared server, you may be out of luck.

> It's always a surprise what can or cannot be done with or to a filesystem if 
> you take a closer look. I really hate the ADS on Windows NTFS, for 
> instance - I find this an abomination. ;-)

I think the main problem with those is that other file systems don't 
have them. They're a pretty neat idea to tag files with additional 
information.


Post a reply to this message

From: clipka
Subject: Re: Linux directory usage question
Date: 16 Sep 2009 08:38:48
Message: <4ab0dc58@news.povray.org>
Orchid XP v8 schrieb:
> Now I have a vague recollection of somebody at uni telling us that ext2 
> works by storing 4096 file entries in an inode. Once you have more files 
> than that, the inode stores a pointer to another inode. This inode 
> contains not file pointers, but inode pointers. So at this level of 
> indirection, you can have up to 4096 * 4096 files.

Not precisely.

Even in a virtually empty directory (or file, for that matter), the 
actual data is not stored in the inode itself; instead, in the simplest
case an inode will refer to up to a dozen of data blocks, which in turn 
store the actual data. Thus, data is limited to e.g. 48 kB on a file 
system with 4 kB data blocks.

For the many cases where this should be insufficient, three more data 
block pointers are available, which are treated specially:

If required, the first of the three special pointers is used to 
reference a data block holding an additional list of pointers to actual 
data blocks - i.e. it would reference quite a bunch of additional data 
blocks via one level of indirection. Sticking to the 4 kB data block 
example, that would give another 1024 data blocks (1036 in total), for a 
bit above 16 MB of data in total. (Note that the 12 direct data block 
pointers are still used in this case.)

If even more data blocks are required, the second special pointer is 
used in a similar way, except that it references even more data blocks 
via two levels of indirections, adding another 1024x1024 data blocks for 
a bit above 64 GB of data in total. Ultimately, the third special block 
pointer would also be used in pretty much the same way, except that it 
would use three levels of indirection, for some 2 TB of data in total.

(Note that the data blocks holding the indirection tables are not 
inodes; that would be a waste of resources, as inodes (a) store a host 
of other information aside from the data block pointers, and (b) are a 
rather precious resource in themselves, as only a fixed number of them 
exists.)

Obviously, the math you heard at university doesn't really work out, as 
it ignores that inodes and indirection blocks have different capacity of 
block pointers. Even more, block size may vary from 1kB to 8kB, and 
directory entry sizes even vary at run-time depending on filename length.

On a sample directory on my Linux system, I figured that one data block 
(4 kB) contains roughly 75 directory entries.

With direct blocks only, this would give me a bit short of 1000 entries.
One level of indirection would give me 1 million entries.
Second level gives me 1 billion.
Third level gives me 1 trillion.

File operations will be bogged down long before however, as the 
directory entries are stored within the data blocks as an unordered 
linked list, which needs to be traversed for each single file lookup.

Not to mention that with so many files, the directory information alone 
would occopy 1/4 of the maximum file system capacity of ext2 (at a block 
size of 4 kB), which does not leave much room for actual data per file :-P


 > I vaguely recall that
> if you exhaust this, it goes to a third level of indirection. But I 
> can't remember whether it stops there.

Definitely so. Though it's probably not much of an issue in practice :-) 
For a block size of 8kB, that would actually be sufficient to reference 
more blocks than the file system as a whole can handle :-)


> Of course, given the source this information came from, it could be 
> completely bogus. ;-)

Well, it's not too far off the mark. Unless that person is expected to 
train you to become Linux Gurus :-)


Post a reply to this message

From: Darren New
Subject: Re: Linux directory usage question
Date: 16 Sep 2009 12:05:50
Message: <4ab10cde$1@news.povray.org>
clipka wrote:
> "If the number of files in a directory exceeds 10000 to 15000 files, the 
> user will normally be warned that operations can last for a long time 
> unless directory indexing is enabled. 

I hate to mention this, but if you put 100,000 files in a directory and then 
delete them all, operations on the directory will still be slow. An 'ls' on 
the empty directory, or even an rmdir, can take several minutes as the 
machine scans thru the directory making sure it is indeed empty before 
deleting it.

> I think the main problem with those is that other file systems don't 
> have them. 

Nonsense. Every modern operating system has them. Macs have had them since 
400K floppies were the norm. NTFS has always (as far as I remember) had 
them. Newer Linux file systems have them (altho IIRC they're sometimes 
organized more as tag/value pairs) - JFS, XFS, Reiser, ZFS, etc. They call 
them "Extended attributes" under Linux. Not surprisingly, they're used 
similarly to how NTFS uses the streams. (Huh. According to wiki, even FAT 
supports them if you use the right kind on NT.)

The biggest problem is that POSIX doesn't support them, so implementors 
aren't sure how to build a non-proprietary interface to them that will be 
accepted.

http://en.wikipedia.org/wiki/Extended_file_attributes

-- 
   Darren New, San Diego CA, USA (PST)
   I ordered stamps from Zazzle that read "Place Stamp Here".


Post a reply to this message

From: Darren New
Subject: Re: Linux directory usage question
Date: 16 Sep 2009 12:06:53
Message: <4ab10d1d$1@news.povray.org>
Orchid XP v8 wrote:
> TC wrote:
>> To all Linux gurus here: can anybody tell me how many files can be 
>> stored in a Linux directory without performance degradation? Or is 
>> there no limit for directory entries on Linux file systems?
> 
> Now I have a vague recollection of somebody at uni telling us that ext2 
> works by storing 4096 file entries in an inode.

Heh. You're confusing free space, used space, and i-nodes (which are 
descriptors for files).

> Of course, given the source this information came from, it could be 
> completely bogus. ;-)

It is.

-- 
   Darren New, San Diego CA, USA (PST)
   I ordered stamps from Zazzle that read "Place Stamp Here".


Post a reply to this message

From: Orchid XP v8
Subject: Re: Linux directory usage question
Date: 16 Sep 2009 12:07:16
Message: <4ab10d34@news.povray.org>
>> Of course, given the source this information came from, it could be 
>> completely bogus. ;-)
> 
> Well, it's not too far off the mark. Unless that person is expected to 
> train you to become Linux Gurus :-)

No. It was a first course in filesystems. I imagine the guy picked est2 
because it was easy to look up the reference material. (We never, ever 
used anything that actually had est2 on it...)

While we're on the subject... NTFS has an optimisation where "small" 
files are stored in the same block as the directory entry. (Saves 
seeking and wasting half a disk block.) Does est2 have any optimisations 
for small files?

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*


Post a reply to this message

Goto Latest 10 Messages Next 10 Messages >>>

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.