POV-Ray : Newsgroups : povray.off-topic : Nice data structure : Re: Nice data structure Server Time
5 Sep 2024 15:28:03 EDT (-0400)
  Re: Nice data structure  
From: Warp
Date: 20 Jun 2009 14:05:05
Message: <4a3d24d1@news.povray.org>
Darren New <dne### [at] sanrrcom> wrote:
> http://www.pcplus.co.uk/node/3074/

  One problem which I see with the tree building algorithm presented there
is that the tree is not balanced. The geometry of the tree will heavily
depend on which order you insert the strings into it. A suboptimal insertion
order will cause the tree to be heavily unbalanced.

  For example, assuming an alphabet of 26 characters, you may end up with
a tree such that if you try to search, for example, for the string "AAAA",
you first have to choose the left node 25 times before you find the first
one equal to A, after which you have to choose again the left node 25
times for the second A, and so on.

  However, if the tree was fully balanced, then you would only have to
choose the left node 5 times (or less) for each of the A's (because
log2(26) = 5). The problem of course accentuates itself with larger
alphabets (eg. if you had a full byte alphabet, ie. 256 distinct "letters").

  Thinking about the problem, I see two possible solutions: You could come
up with a really contrived solution, similar to how binary search trees are
kept balanced (eg. by using the same techniques as with a red-black tree).

  However, I can think of a much easier solution which doesn't need any
rebalancing:

  Always create every node so that it partitions the alphabet in two equal
parts. In other words, assuming an alphabet of A-Z, the root node would
*always* be N. Then create new nodes as needed (for the string to be
inserted): If the first character of the new string is less then N, then
create a G node at the left of the root (regardless of what the first
character of the string was). If it's larger, create a T node at the right.
If it was equal, then create an new N node as the "equal" child node for
the root. If the first character of the string was not N, keep splitting
like this until you end up with the character in question, and then always
create a new N node as a new child node for that. Repeat this until the
entire string has been inserted.

  There will be some extra nodes (something line O(log2(n)) extra nodes
per character), but the tree will always be fully balanced, and immediately
when a new string is added which already has an existing node in the tree,
it will benefit from it. In an optimal case the amount of extraneous nodes
in the tree will be quite small compared to the amount of "true" nodes.

-- 
                                                          - Warp


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.