POV-Ray : Newsgroups : povray.off-topic : Nice data structure : Re: Nice data structure Server Time
5 Sep 2024 15:27:50 EDT (-0400)
  Re: Nice data structure  
From: Warp
Date: 20 Jun 2009 16:20:22
Message: <4a3d4486@news.povray.org>
Darren New <dne### [at] sanrrcom> wrote:
> I think the way the original paper worked it is to store the '\0' as part of 
> the string in the tree. So your CAT vs CATS would be
> -->C-->A-->T-->S-->\0
>             |
>             v
>            \0

  But that's not a null pointer indicating the end of the string. That's
a regular node containing a special character which indicates the end of
the string. Implementing my automatically-balanced tree creation idea can
use the exact same method for marking ends of strings.

  Btw, the node containing an end-of-string value idea works only as long
as there's a special character value which never appears in a string. It
can't be done like that if the range of values of each character in your
"string" can hold is the full 0-255 range. Of course you could use values
larger than 8 bits, but that's just basically the exact same thing as having
an end-of-string bit at each node. (And given that the "character" in the
node will probably take 4 bytes for alignment purposes anyways, there's
plenty of unused bits.)

  Which got me thinking: This data container uses quite a lot of space,
actually. Each node requires three pointers plus the character itself.
In a 32-bit system the pointers will be 4 bytes each, and as said, memory
alignment requires for the character to take 4 bytes as well, for a grand
total of 16 bytes per node (and this assuming you are storing each node
optimally in an array rather than allocating each node separately).
16 bytes to store one single character sounds quite a lot... :)

  Of course strings with equal beginnings will share nodes, but if the
strings are very diverse, they will still end up using quite a lot of
memory.

> If you have '\0' nodes, 
> you could special-case the "next character" (i.e., middle) pointer to point 
> to the data you want to associate with the word.

  In fact, since no character can be smaller than '\0', you could also
have additional '\0' nodes on the left branch, each one pointing to
associated data. That way you can have a multiset, ie. the same key
pointing to multiple data, rather than just one.

-- 
                                                          - Warp


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.