POV-Ray : Newsgroups : povray.off-topic : Nice data structure Server Time
5 Sep 2024 19:28:31 EDT (-0400)
  Nice data structure (Message 11 to 20 of 25)  
<<< Previous 10 Messages Goto Latest 10 Messages Next 5 Messages >>>
From: Warp
Subject: Re: Nice data structure
Date: 20 Jun 2009 18:09:14
Message: <4a3d5e0a@news.povray.org>
Darren New <dne### [at] sanrrcom> wrote:
> Warp wrote:
> > Each node requires three pointers plus the character itself.

> Or, with enough work, you could get away with not using full-sized pointers, 
> perhaps. Indexes into arrays (with said arrays split into chunks if you have 
> trouble finding that much contiguous address space) might work too.

  I suppose that if you limit the size of the data container to 2^24 nodes,
you could use "pointers" of 3 bytes, which would be 9 bytes plus the
character. Alignment usually requires to round that up to 12 bytes. (The
extra space could be used for the end-of-string bit.)

  If you limited the number of nodes to 2^16 then you could use 8 bytes
per node. However, I'm not sure if such a limitation is very practical.

  Of course if you are willing to sacrifice some speed in order to squeeze
space, you could break the word/half-word boundaries, and compact eg. the
three 24-bit pointers and the 8-bit character into a 10-byte node in a
byte array. Or if you are not afraid of bit-fu, something more exotic like
21 bits for the "pointers" (giving a maximum of 2097152 nodes), 8 bits for
the characters and 1 bit for the end-of-string bit, which would result in
9 bytes per node.

-- 
                                                          - Warp


Post a reply to this message

From: Darren New
Subject: Re: Nice data structure
Date: 20 Jun 2009 18:35:45
Message: <4a3d6441@news.povray.org>
Warp wrote:
>   You can't use the "right child" pointer of the '\0' node for ancillary
> data because it can be part of the regular tree structure.

True, true. :-)

-- 
   Darren New, San Diego CA, USA (PST)
   Insanity is a small city on the western
   border of the State of Mind.


Post a reply to this message

From: Darren New
Subject: Re: Nice data structure
Date: 20 Jun 2009 18:40:25
Message: <4a3d6559$1@news.povray.org>
Warp wrote:
>   I suppose that if you limit the size of the data container to 2^24 nodes,

No, you could use "offsets to next element" or something, too, methinks. But 
yeah, that's kind of the thing I was thinking about. By the time you have 
more than 2^16 nodes, you're probably worrying more about speed, paging, etc 
than you are about raw size, perhaps. And you're going to start to see a lot 
of overlap in the earlier parts of the strings.

-- 
   Darren New, San Diego CA, USA (PST)
   Insanity is a small city on the western
   border of the State of Mind.


Post a reply to this message

From: clipka
Subject: Re: Nice data structure
Date: 20 Jun 2009 20:40:00
Message: <web.4a3d808c8e78d9a973b318a20@news.povray.org>
Warp <war### [at] tagpovrayorg> wrote:
> Darren New <dne### [at] sanrrcom> wrote:
>   However, I can think of a much easier solution which doesn't need any
> rebalancing:
>
>   Always create every node so that it partitions the alphabet in two equal
> parts. In other words, assuming an alphabet of A-Z, the root node would
> *always* be N.

This would require your data to have a linear distribution.

What if all your keys had the form "img00000000002AC73F.jpg"?


As a matter of fact, there already are good tree concepts out there that include
automatic balancing without any a-priori knowledge of the data. Not perfect, but
with a certain guaranteed minimum quality. Red-Black-Trees are a quite popular
example. I'd expect the same principle to be adaptable to ternary trees, too.


Post a reply to this message

From: Darren New
Subject: Re: Nice data structure
Date: 21 Jun 2009 01:18:18
Message: <4a3dc29a$1@news.povray.org>
clipka wrote:
> Red-Black-Trees are a quite popular example. 

Skip lists are also a cool data structure for when you have a lot of data 
you'd like to put in a tree but don't want to balance things etc. 
Generalized to skip graphs in the same sense that a DHT generalizes a hashtable.

-- 
   Darren New, San Diego CA, USA (PST)
   Insanity is a small city on the western
   border of the State of Mind.


Post a reply to this message

From: Warp
Subject: Re: Nice data structure
Date: 21 Jun 2009 09:57:55
Message: <4a3e3c63@news.povray.org>
Darren New <dne### [at] sanrrcom> wrote:
> Warp wrote:
> >   I suppose that if you limit the size of the data container to 2^24 nodes,

> No, you could use "offsets to next element" or something, too, methinks. But 
> yeah, that's kind of the thing I was thinking about. By the time you have 
> more than 2^16 nodes, you're probably worrying more about speed, paging, etc 
> than you are about raw size, perhaps. And you're going to start to see a lot 
> of overlap in the earlier parts of the strings.

  I'd say it's the exact opposite: When you start having more than 2^16
nodes, that's when you need to star worrying about the amount of space the
data structure is taking. With less than 2^16 nodes the size of an individual
node is rather irrelevant.

-- 
                                                          - Warp


Post a reply to this message

From: Warp
Subject: Re: Nice data structure
Date: 21 Jun 2009 10:01:13
Message: <4a3e3d29@news.povray.org>
clipka <nomail@nomail> wrote:
> This would require your data to have a linear distribution.

  No, it doesn't. For an alphabet of eg. 26 characters you need to perform
at most 5 steps per character, period, completely regardless of what the
stored strings are.

  With an unbalanced tree the worst case scenario is that you end up having
to perform 25 steps per character.

-- 
                                                          - Warp


Post a reply to this message

From: Darren New
Subject: Re: Nice data structure
Date: 21 Jun 2009 13:38:08
Message: <4a3e7000$1@news.povray.org>
Warp wrote:
>   I'd say it's the exact opposite: When you start having more than 2^16
> nodes, that's when you need to star worrying about the amount of space the
> data structure is taking. With less than 2^16 nodes the size of an individual
> node is rather irrelevant.

My thought was that if you have that much data, you're probably running on a 
machine where the memory limits aren't very tight (i.e., a desktop system, 
say).  Of course, the killer is the ratio of the number of nodes to the 
amount of memory, so sure, when you start getting up in the 2^25 number of 
nodes range the size of the nodes is going to be prohibitive on a 32-bit 
machine again.

The nice thing is the tree doesn't need rotations, so you can clip off a 
large sub-tree of the tree and put it in its own address space. That's hard 
to do if you have to balance nodes and stuff.

-- 
   Darren New, San Diego CA, USA (PST)
   Insanity is a small city on the western
   border of the State of Mind.


Post a reply to this message

From: Warp
Subject: Re: Nice data structure
Date: 21 Jun 2009 14:27:55
Message: <4a3e7baa@news.povray.org>
Darren New <dne### [at] sanrrcom> wrote:
> Warp wrote:
> >   I'd say it's the exact opposite: When you start having more than 2^16
> > nodes, that's when you need to star worrying about the amount of space the
> > data structure is taking. With less than 2^16 nodes the size of an individual
> > node is rather irrelevant.

> My thought was that if you have that much data, you're probably running on a 
> machine where the memory limits aren't very tight (i.e., a desktop system, 
> say).

  Well, I may be biased because of my programming history background, but
usually an app crunching tons of data requires space-efficient data containers
(so that it can crunch as much data as possible with a given amount of RAM).
You usually want your app to be able to handle as much data as possible.

  Also my recent programming history can also have some effect on how I view
these things, namely programming for handheld systems. For example, games
which require a full (eg. English) dictionary are an extremely typical case
where you need a very fast data container which should take as little space
as possible (because a handheld system won't have a multi-GHz CPU nor
gigabytes of RAM). A typical English dictionary is excruciatingly annoying
in this regard because they usually go just a bit over the magical limit
of 65536 words, especially if the game requires plurals and other inflected
forms. (And even if the dictionary did have under 65536 words, the total
amount of character data would still be well beyond that limit, and since
words are of differing lengths you really need byte-pointers/indices.)

>  Of course, the killer is the ratio of the number of nodes to the 
> amount of memory, so sure, when you start getting up in the 2^25 number of 
> nodes range the size of the nodes is going to be prohibitive on a 32-bit 
> machine again.

  Hmm, I'm not so sure. 2^25 nodes, each node taking eg. 16 bytes, would
require "only" 512 MB of RAM, which is well below the capacity of any
modern desktop system.

-- 
                                                          - Warp


Post a reply to this message

From: Darren New
Subject: Re: Nice data structure
Date: 21 Jun 2009 17:57:45
Message: <4a3eacd9$1@news.povray.org>
Warp wrote:
> You usually want your app to be able to handle as much data as possible.

Yeah. My background tends towards needing to crunch arbitrary amounts of 
data, and hence I look at data structures that cache to disk more 
efficiently. The sorts of stuff I write can't rely on having enough memory 
to hold it all, but also tend to assume you're not going to look at all of 
it at once (like a database of customers, for example, rather than a game, 
where you really want the whole level in memory at once).

>   Also my recent programming history can also have some effect on how I view
> these things, namely programming for handheld systems. 

Cool. Yeah, I'm kind of impressed, actually. I'm writing code for set top 
boxes right now, and the boss has me porting webkit to the machine. He's all 
worried about the performance, and I'm saying "maybe you shouldn't have two 
layers of interpreted language on top of your 100MHz CPU just to draw your 
top-level menus."

Coming from the credit card terminal world, where having 14K of ram was 
pretty much top-of-the-line, it's a little boggling to have a whole 128M of 
RAM. :-)

Handheld machines are definitely still the cutting edge of resource pain, 
tho, yes.  Especially since people expect them to do flashy things 
efficiently and cheaply and responsively.  CC terminals need to be quick, 
but nobody really cares if you have a 16x4 text-only B&W screen on one.

>>  Of course, the killer is the ratio of the number of nodes to the 
>> amount of memory, so sure, when you start getting up in the 2^25 number of 
>> nodes range the size of the nodes is going to be prohibitive on a 32-bit 
>> machine again.
> 
>   Hmm, I'm not so sure. 2^25 nodes, each node taking eg. 16 bytes, would
> require "only" 512 MB of RAM, which is well below the capacity of any
> modern desktop system.

Fair enough. It's getting close to the limit of what you can put in one 
address space conveniently, is all. (On a 32-bit machine, that is.) Of 
course you still have the malloc()/free() space overhead for each blob, and 
whatever the rest of the program is doing, and etc.

You can run into limits either way, with small memories or with google-sized 
data. With my work, I tend to run into problems with google-sized data, 
where the speed of even copying files from one place on the disk to another 
is a significant overhead. I can see how if you're looking at it from a game 
console or handheld device POV, you can see the space problem in a different 
way.

-- 
   Darren New, San Diego CA, USA (PST)
   Insanity is a small city on the western
   border of the State of Mind.


Post a reply to this message

<<< Previous 10 Messages Goto Latest 10 Messages Next 5 Messages >>>

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.