POV-Ray: Newsgroups: povray.off-topic: Nice data structure

POV-Ray : Newsgroups : povray.off-topic : Nice data structure		Server Time 12 Jul 2025 23:06:20 EDT (-0400)

<<< Previous 10 Messages

Goto Initial 10 Messages

From: Warp
Subject: Re: Nice data structure
Date: 21 Jun 2009 09:57:55
Message: <4a3e3c63@news.povray.org>

Darren New <dne### [at] sanrrcom> wrote:
> Warp wrote:
> >   I suppose that if you limit the size of the data container to 2^24 nodes,

> No, you could use "offsets to next element" or something, too, methinks. But 
> yeah, that's kind of the thing I was thinking about. By the time you have 
> more than 2^16 nodes, you're probably worrying more about speed, paging, etc 
> than you are about raw size, perhaps. And you're going to start to see a lot 
> of overlap in the earlier parts of the strings.

  I'd say it's the exact opposite: When you start having more than 2^16
nodes, that's when you need to star worrying about the amount of space the
data structure is taking. With less than 2^16 nodes the size of an individual
node is rather irrelevant.

-- 
                                                          - Warp

Post a reply to this message

From: Warp
Subject: Re: Nice data structure
Date: 21 Jun 2009 10:01:13
Message: <4a3e3d29@news.povray.org>

clipka <nomail@nomail> wrote:
> This would require your data to have a linear distribution.

  No, it doesn't. For an alphabet of eg. 26 characters you need to perform
at most 5 steps per character, period, completely regardless of what the
stored strings are.

  With an unbalanced tree the worst case scenario is that you end up having
to perform 25 steps per character.

-- 
                                                          - Warp

Post a reply to this message

From: Darren New
Subject: Re: Nice data structure
Date: 21 Jun 2009 13:38:08
Message: <4a3e7000$1@news.povray.org>

Warp wrote:
>   I'd say it's the exact opposite: When you start having more than 2^16
> nodes, that's when you need to star worrying about the amount of space the
> data structure is taking. With less than 2^16 nodes the size of an individual
> node is rather irrelevant.

My thought was that if you have that much data, you're probably running on a 
machine where the memory limits aren't very tight (i.e., a desktop system, 
say).  Of course, the killer is the ratio of the number of nodes to the 
amount of memory, so sure, when you start getting up in the 2^25 number of 
nodes range the size of the nodes is going to be prohibitive on a 32-bit 
machine again.

The nice thing is the tree doesn't need rotations, so you can clip off a 
large sub-tree of the tree and put it in its own address space. That's hard 
to do if you have to balance nodes and stuff.

-- 
   Darren New, San Diego CA, USA (PST)
   Insanity is a small city on the western
   border of the State of Mind.

Post a reply to this message

From: Warp
Subject: Re: Nice data structure
Date: 21 Jun 2009 14:27:55
Message: <4a3e7baa@news.povray.org>

Darren New <dne### [at] sanrrcom> wrote:
> Warp wrote:
> >   I'd say it's the exact opposite: When you start having more than 2^16
> > nodes, that's when you need to star worrying about the amount of space the
> > data structure is taking. With less than 2^16 nodes the size of an individual
> > node is rather irrelevant.

> My thought was that if you have that much data, you're probably running on a 
> machine where the memory limits aren't very tight (i.e., a desktop system, 
> say).

  Well, I may be biased because of my programming history background, but
usually an app crunching tons of data requires space-efficient data containers
(so that it can crunch as much data as possible with a given amount of RAM).
You usually want your app to be able to handle as much data as possible.

  Also my recent programming history can also have some effect on how I view
these things, namely programming for handheld systems. For example, games
which require a full (eg. English) dictionary are an extremely typical case
where you need a very fast data container which should take as little space
as possible (because a handheld system won't have a multi-GHz CPU nor
gigabytes of RAM). A typical English dictionary is excruciatingly annoying
in this regard because they usually go just a bit over the magical limit
of 65536 words, especially if the game requires plurals and other inflected
forms. (And even if the dictionary did have under 65536 words, the total
amount of character data would still be well beyond that limit, and since
words are of differing lengths you really need byte-pointers/indices.)

>  Of course, the killer is the ratio of the number of nodes to the 
> amount of memory, so sure, when you start getting up in the 2^25 number of 
> nodes range the size of the nodes is going to be prohibitive on a 32-bit 
> machine again.

  Hmm, I'm not so sure. 2^25 nodes, each node taking eg. 16 bytes, would
require "only" 512 MB of RAM, which is well below the capacity of any
modern desktop system.

-- 
                                                          - Warp

Post a reply to this message

From: Darren New
Subject: Re: Nice data structure
Date: 21 Jun 2009 17:57:45
Message: <4a3eacd9$1@news.povray.org>

Warp wrote:
> You usually want your app to be able to handle as much data as possible.

Yeah. My background tends towards needing to crunch arbitrary amounts of 
data, and hence I look at data structures that cache to disk more 
efficiently. The sorts of stuff I write can't rely on having enough memory 
to hold it all, but also tend to assume you're not going to look at all of 
it at once (like a database of customers, for example, rather than a game, 
where you really want the whole level in memory at once).

>   Also my recent programming history can also have some effect on how I view
> these things, namely programming for handheld systems. 

Cool. Yeah, I'm kind of impressed, actually. I'm writing code for set top 
boxes right now, and the boss has me porting webkit to the machine. He's all 
worried about the performance, and I'm saying "maybe you shouldn't have two 
layers of interpreted language on top of your 100MHz CPU just to draw your 
top-level menus."

Coming from the credit card terminal world, where having 14K of ram was 
pretty much top-of-the-line, it's a little boggling to have a whole 128M of 
RAM. :-)

Handheld machines are definitely still the cutting edge of resource pain, 
tho, yes.  Especially since people expect them to do flashy things 
efficiently and cheaply and responsively.  CC terminals need to be quick, 
but nobody really cares if you have a 16x4 text-only B&W screen on one.

>>  Of course, the killer is the ratio of the number of nodes to the 
>> amount of memory, so sure, when you start getting up in the 2^25 number of 
>> nodes range the size of the nodes is going to be prohibitive on a 32-bit 
>> machine again.
> 
>   Hmm, I'm not so sure. 2^25 nodes, each node taking eg. 16 bytes, would
> require "only" 512 MB of RAM, which is well below the capacity of any
> modern desktop system.

Fair enough. It's getting close to the limit of what you can put in one 
address space conveniently, is all. (On a 32-bit machine, that is.) Of 
course you still have the malloc()/free() space overhead for each blob, and 
whatever the rest of the program is doing, and etc.

You can run into limits either way, with small memories or with google-sized 
data. With my work, I tend to run into problems with google-sized data, 
where the speed of even copying files from one place on the disk to another 
is a significant overhead. I can see how if you're looking at it from a game 
console or handheld device POV, you can see the space problem in a different 
way.

-- 
   Darren New, San Diego CA, USA (PST)
   Insanity is a small city on the western
   border of the State of Mind.

Post a reply to this message

From: Invisible
Subject: Re: Nice data structure
Date: 22 Jun 2009 06:38:26
Message: <4a3f5f22$1@news.povray.org>

Warp wrote:

>   One problem which I see with the tree building algorithm presented there
> is that the tree is not balanced. The geometry of the tree will heavily
> depend on which order you insert the strings into it. A suboptimal insertion
> order will cause the tree to be heavily unbalanced.

As far as I can tell, it's only the letter selection part of the tree 
which can be unbalanced. E.g., if you have a million strings, you won't 
ever have to do a million comparisons to find a given string. You might 
end up doing 25 comparisons per character or something, but you'll never 
need to do a million.

(A common optimisation I've seen is to store the shared prefix in a tree 
and then any unique suffix is stored all in one lump rather than as tree 
nodes. As similar keys are added, the unique lump gets expanded out. 
This way, the work is proportional to the size of the shared key prefix, 
not the whole key.)

>   However, I can think of a much easier solution which doesn't need any
> rebalancing:
> 
>   Always create every node so that it partitions the alphabet in two equal
> parts.

Assuming the range of permissible "characters" is small and known prior 
to starting, sure.

(You might use this algorithm, e.g., with IP addresses as keys, and then 
the key string elements aren't characters any more. But hey, if you're 
going to use fixed-width binary data, there are better structures 
available...)

Post a reply to this message

From: Invisible
Subject: Re: Nice data structure
Date: 22 Jun 2009 10:23:55
Message: <4a3f93fb$1@news.povray.org>

Darren New wrote:
> http://www.pcplus.co.uk/node/3074/

I just implemented it... partial matching and all. ;-)

Post a reply to this message

From: Warp
Subject: Re: Nice data structure
Date: 22 Jun 2009 12:20:18
Message: <4a3faf42@news.povray.org>

Invisible <voi### [at] devnull> wrote:
> As far as I can tell, it's only the letter selection part of the tree 
> which can be unbalanced. E.g., if you have a million strings, you won't 
> ever have to do a million comparisons to find a given string. You might 
> end up doing 25 comparisons per character or something, but you'll never 
> need to do a million.

  But if the strings themselves are very long, then it can make a
significant difference whether you have to make 25 comparisons per
character or 5.

-- 
                                                          - Warp

Post a reply to this message

From: Orchid XP v8
Subject: Re: Nice data structure
Date: 22 Jun 2009 14:26:03
Message: <4a3fccbb$1@news.povray.org>

Warp wrote:
> Invisible <voi### [at] devnull> wrote:
>> As far as I can tell, it's only the letter selection part of the tree 
>> which can be unbalanced. E.g., if you have a million strings, you won't 
>> ever have to do a million comparisons to find a given string. You might 
>> end up doing 25 comparisons per character or something, but you'll never 
>> need to do a million.
> 
>   But if the strings themselves are very long, then it can make a
> significant difference whether you have to make 25 comparisons per
> character or 5.

Sure. I'm just saying, if the number of comparisons were dependent on 
the number of keys, you'd be in a whole other *complexity class*. That 
puts "really slow" into perspective.

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

From: Orchid XP v8
Subject: Re: Nice data structure
Date: 29 Jun 2009 14:08:45
Message: <4a49032d@news.povray.org>

>> http://www.pcplus.co.uk/node/3074/
> 
> I just implemented it... partial matching and all. ;-)

...and then somebody on the Haskell mailing list announces that they're 
releasing this on Hackage. Go figure!

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

<<< Previous 10 Messages

Goto Initial 10 Messages