POV-Ray: Newsgroups: povray.off-topic: Linux directory usage question

POV-Ray : Newsgroups : povray.off-topic : Linux directory usage question		Server Time 13 Jul 2025 06:39:15 EDT (-0400)

<<< Previous 6 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: clipka
Subject: Re: Linux directory usage question
Date: 16 Sep 2009 08:38:48
Message: <4ab0dc58@news.povray.org>

Orchid XP v8 schrieb:
> Now I have a vague recollection of somebody at uni telling us that ext2 
> works by storing 4096 file entries in an inode. Once you have more files 
> than that, the inode stores a pointer to another inode. This inode 
> contains not file pointers, but inode pointers. So at this level of 
> indirection, you can have up to 4096 * 4096 files.

Not precisely.

Even in a virtually empty directory (or file, for that matter), the 
actual data is not stored in the inode itself; instead, in the simplest
case an inode will refer to up to a dozen of data blocks, which in turn 
store the actual data. Thus, data is limited to e.g. 48 kB on a file 
system with 4 kB data blocks.

For the many cases where this should be insufficient, three more data 
block pointers are available, which are treated specially:

If required, the first of the three special pointers is used to 
reference a data block holding an additional list of pointers to actual 
data blocks - i.e. it would reference quite a bunch of additional data 
blocks via one level of indirection. Sticking to the 4 kB data block 
example, that would give another 1024 data blocks (1036 in total), for a 
bit above 16 MB of data in total. (Note that the 12 direct data block 
pointers are still used in this case.)

If even more data blocks are required, the second special pointer is 
used in a similar way, except that it references even more data blocks 
via two levels of indirections, adding another 1024x1024 data blocks for 
a bit above 64 GB of data in total. Ultimately, the third special block 
pointer would also be used in pretty much the same way, except that it 
would use three levels of indirection, for some 2 TB of data in total.

(Note that the data blocks holding the indirection tables are not 
inodes; that would be a waste of resources, as inodes (a) store a host 
of other information aside from the data block pointers, and (b) are a 
rather precious resource in themselves, as only a fixed number of them 
exists.)

Obviously, the math you heard at university doesn't really work out, as 
it ignores that inodes and indirection blocks have different capacity of 
block pointers. Even more, block size may vary from 1kB to 8kB, and 
directory entry sizes even vary at run-time depending on filename length.

On a sample directory on my Linux system, I figured that one data block 
(4 kB) contains roughly 75 directory entries.

With direct blocks only, this would give me a bit short of 1000 entries.
One level of indirection would give me 1 million entries.
Second level gives me 1 billion.
Third level gives me 1 trillion.

File operations will be bogged down long before however, as the 
directory entries are stored within the data blocks as an unordered 
linked list, which needs to be traversed for each single file lookup.

Not to mention that with so many files, the directory information alone 
would occopy 1/4 of the maximum file system capacity of ext2 (at a block 
size of 4 kB), which does not leave much room for actual data per file :-P


 > I vaguely recall that
> if you exhaust this, it goes to a third level of indirection. But I 
> can't remember whether it stops there.

Definitely so. Though it's probably not much of an issue in practice :-) 
For a block size of 8kB, that would actually be sufficient to reference 
more blocks than the file system as a whole can handle :-)


> Of course, given the source this information came from, it could be 
> completely bogus. ;-)

Well, it's not too far off the mark. Unless that person is expected to 
train you to become Linux Gurus :-)

Post a reply to this message

From: Darren New
Subject: Re: Linux directory usage question
Date: 16 Sep 2009 12:05:50
Message: <4ab10cde$1@news.povray.org>

clipka wrote:
> "If the number of files in a directory exceeds 10000 to 15000 files, the 
> user will normally be warned that operations can last for a long time 
> unless directory indexing is enabled. 

I hate to mention this, but if you put 100,000 files in a directory and then 
delete them all, operations on the directory will still be slow. An 'ls' on 
the empty directory, or even an rmdir, can take several minutes as the 
machine scans thru the directory making sure it is indeed empty before 
deleting it.

> I think the main problem with those is that other file systems don't 
> have them. 

Nonsense. Every modern operating system has them. Macs have had them since 
400K floppies were the norm. NTFS has always (as far as I remember) had 
them. Newer Linux file systems have them (altho IIRC they're sometimes 
organized more as tag/value pairs) - JFS, XFS, Reiser, ZFS, etc. They call 
them "Extended attributes" under Linux. Not surprisingly, they're used 
similarly to how NTFS uses the streams. (Huh. According to wiki, even FAT 
supports them if you use the right kind on NT.)

The biggest problem is that POSIX doesn't support them, so implementors 
aren't sure how to build a non-proprietary interface to them that will be 
accepted.

http://en.wikipedia.org/wiki/Extended_file_attributes

-- 
   Darren New, San Diego CA, USA (PST)
   I ordered stamps from Zazzle that read "Place Stamp Here".

Post a reply to this message

From: Darren New
Subject: Re: Linux directory usage question
Date: 16 Sep 2009 12:06:53
Message: <4ab10d1d$1@news.povray.org>

Orchid XP v8 wrote:
> TC wrote:
>> To all Linux gurus here: can anybody tell me how many files can be 
>> stored in a Linux directory without performance degradation? Or is 
>> there no limit for directory entries on Linux file systems?
> 
> Now I have a vague recollection of somebody at uni telling us that ext2 
> works by storing 4096 file entries in an inode.

Heh. You're confusing free space, used space, and i-nodes (which are 
descriptors for files).

> Of course, given the source this information came from, it could be 
> completely bogus. ;-)

It is.

-- 
   Darren New, San Diego CA, USA (PST)
   I ordered stamps from Zazzle that read "Place Stamp Here".

Post a reply to this message

From: Orchid XP v8
Subject: Re: Linux directory usage question
Date: 16 Sep 2009 12:07:16
Message: <4ab10d34@news.povray.org>

>> Of course, given the source this information came from, it could be 
>> completely bogus. ;-)
> 
> Well, it's not too far off the mark. Unless that person is expected to 
> train you to become Linux Gurus :-)

No. It was a first course in filesystems. I imagine the guy picked est2 
because it was easy to look up the reference material. (We never, ever 
used anything that actually had est2 on it...)

While we're on the subject... NTFS has an optimisation where "small" 
files are stored in the same block as the directory entry. (Saves 
seeking and wasting half a disk block.) Does est2 have any optimisations 
for small files?

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

From: Darren New
Subject: Re: Linux directory usage question
Date: 16 Sep 2009 12:19:12
Message: <4ab11000@news.povray.org>

Orchid XP v8 wrote:
> No. It was a first course in filesystems. I imagine the guy picked est2 
> because it was easy to look up the reference material. 

Well, he still failed. Or you misunderstood what he was saying. :-)

If you want to know how ext2 works, look up how Unix v7's file system worked 
about 30 years ago. It's essentially the same, except in *where* it stores 
things physically on the disk. The concepts are all the same.

> While we're on the subject... NTFS has an optimisation where "small" 
> files are stored in the same block as the directory entry. (Saves 
> seeking and wasting half a disk block.) Does est2 have any optimisations 
> for small files?

NTFS's "i-nodes" (called MFT records) are some 1K to 4K in size. Ext2's 
inodes are closer to 64 bytes or something. There's no slack space to speak 
of in an ext2 i-node.

Anyway, the idea of the data being stored in the same place as the 
attributes on NTFS is based on the fact that the data, the permissions, and 
the locations where other data is stored, is all the same sort of "stuff". 
If your list of permissions gets too big, or your file has lots of 
fragments, those too might wind up being stored in the "data" area of the disk.

-- 
   Darren New, San Diego CA, USA (PST)
   I ordered stamps from Zazzle that read "Place Stamp Here".

Post a reply to this message

From: Warp
Subject: Re: Linux directory usage question
Date: 16 Sep 2009 12:25:58
Message: <4ab11196@news.povray.org>

TC <do-not-reply@i-do get-enough-spam-already-2498.com> wrote:
> To all Linux gurus here: can anybody tell me how many files can be stored in 
> a Linux directory without performance degradation? Or is there no limit for 
> directory entries on Linux file systems?

http://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits

  I think most file systems (such as ReiserFS) have O(log n) indexing for
files in a directory. So they don't get degraded at all.

-- 
                                                          - Warp

Post a reply to this message

From: Darren New
Subject: Re: Linux directory usage question
Date: 16 Sep 2009 12:31:13
Message: <4ab112d1$1@news.povray.org>

Warp wrote:
>   I think most file systems (such as ReiserFS) have O(log n) indexing for
> files in a directory. So they don't get degraded at all.

It's an option called "dir_index" on ext2/3 that you can set with tune2fs or 
with the appropriate fsck (and of course during mkfs).  Just in case you 
ever need to know. :-)

-- 
   Darren New, San Diego CA, USA (PST)
   I ordered stamps from Zazzle that read "Place Stamp Here".

Post a reply to this message

From: Orchid XP v8
Subject: Re: Linux directory usage question
Date: 16 Sep 2009 15:25:55
Message: <4ab13bc3@news.povray.org>

>> No. It was a first course in filesystems. I imagine the guy picked 
>> est2 because it was easy to look up the reference material. 
> 
> Well, he still failed. Or you misunderstood what he was saying. :-)

It was... my God... about ten years ago now. o_O

>> While we're on the subject... NTFS has an optimisation where "small" 
>> files are stored in the same block as the directory entry. (Saves 
>> seeking and wasting half a disk block.) Does est2 have any 
>> optimisations for small files?
> 
> NTFS's "i-nodes" (called MFT records) are some 1K to 4K in size. Ext2's 
> inodes are closer to 64 bytes or something. There's no slack space to 
> speak of in an ext2 i-node.

Sure. I was just wondering if ext2 does anything special with small 
files, that's all.

Since files can only be allocated an integral number of data blocks, 
really tiny files potentially waste an entire block. A directory full of 
millions of tiny files could actually eat quite a lot of space. But by 
putting that data inside the directory itself, you avoid all that wasted 
space, and save on some disk seek time to boot. It seems like a neat trick.

Then again, I've sometimes wondered what would happen if you had some 
filesystem that split the disk into several seperate regions with 
different block sizes, and allocated files accordingly. (I.e., put the 
really huge files in the area with big blocks, and the tiny files in 
some area with tiny block sizes.) I rather suspect you'd permanently be 
running out of whichever size you happen to need the most tho...

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

From: Darren New
Subject: Re: Linux directory usage question
Date: 16 Sep 2009 15:43:28
Message: <4ab13fe0@news.povray.org>

Orchid XP v8 wrote:
> Sure. I was just wondering if ext2 does anything special with small 
> files, that's all.

Right. It could, for example, pack data from multiple small files into one 
sector/block/whatever. But i-nodes are too small for that.

> putting that data inside the directory itself,

Now you're talking about directories, which are different from i-nodes. And 
no, if you put the data in the directory, then you couldn't have multiple 
links to a small file.

> Then again, I've sometimes wondered what would happen if you had some 
> filesystem that split the disk into several seperate regions with 
> different block sizes, and allocated files accordingly. (I.e., put the 
> really huge files in the area with big blocks, and the tiny files in 
> some area with tiny block sizes.) I rather suspect you'd permanently be 
> running out of whichever size you happen to need the most tho...

And interesting thought. I've never seen that done.  Given that disks are 
broken into sectors all the same size, and given that the only reasons for 
allocating space in units larger than one sector are defragmentation and 
efficiency of storing pointers to clusters, there's no real good reason for it.

On the other hand, the Amiga formatted the floppy track every time it wrote 
the track, so you could probably actually fit more large files on a disk 
than small files, even if every small file was exactly one sector, by making 
the sectors physically larger on tracks where they store a big file.

-- 
   Darren New, San Diego CA, USA (PST)
   I ordered stamps from Zazzle that read "Place Stamp Here".

Post a reply to this message

From: clipka
Subject: Re: Linux directory usage question
Date: 16 Sep 2009 17:44:57
Message: <4ab15c59@news.povray.org>

Darren New schrieb:
>> Then again, I've sometimes wondered what would happen if you had some 
>> filesystem that split the disk into several seperate regions with 
>> different block sizes, and allocated files accordingly. (I.e., put the 
>> really huge files in the area with big blocks, and the tiny files in 
>> some area with tiny block sizes.) I rather suspect you'd permanently 
>> be running out of whichever size you happen to need the most tho...
> 
> And interesting thought. I've never seen that done.  Given that disks 
> are broken into sectors all the same size, and given that the only 
> reasons for allocating space in units larger than one sector are 
> defragmentation and efficiency of storing pointers to clusters, there's 
> no real good reason for it.

You are aware that modern file systems use block sizes /significantly/ 
larger than the disk sector size?

A disk sector is 512 byte in size virtually everywhere, while file 
systems typically use block sizes one order of magnitude larger.

Why? Because it is actually more memory-efficient to /not/ use even the 
smallest gaps - because that inflates the required management overhead, 
severely reducing the total payload capacity when the files are 
sufficiently /large/ on average.

In the end, some compromise is used, based on the statistical 
distribution of file sizes. A really /good/ system administrator might 
tune the various volumes on his systems to have block sizes that match 
the actual use.

> On the other hand, the Amiga formatted the floppy track every time it 
> wrote the track, so you could probably actually fit more large files on 
> a disk than small files, even if every small file was exactly one 
> sector, by making the sectors physically larger on tracks where they 
> store a big file.

That won't work for hard disk drives: Even if you /could/ still 
low-level-format them (and maybe that's actually still possible with 
special tools), it would be a particularly bad idea, given that they 
don't even disclose their actual drive geometry anymore these days 
(aside from the /official/ total capacity - but even that may be only 
half the truth, as I heard say that modern hard drives reserve some 
sectors as spare, to deal with sectors that over time begin to "almost 
lose data", i.e. become seriously difficult to read - so those can be 
avoided and operation can safely continue without any true problems 
while the system administrator orders a new drive - provided he has kept 
an eye on the SMART status of his drives).

Post a reply to this message

<<< Previous 6 Messages

Goto Latest 10 Messages

Next 10 Messages >>>