|
|
Warp wrote:
> One problem which I see with the tree building algorithm presented there
> is that the tree is not balanced. The geometry of the tree will heavily
> depend on which order you insert the strings into it. A suboptimal insertion
> order will cause the tree to be heavily unbalanced.
As far as I can tell, it's only the letter selection part of the tree
which can be unbalanced. E.g., if you have a million strings, you won't
ever have to do a million comparisons to find a given string. You might
end up doing 25 comparisons per character or something, but you'll never
need to do a million.
(A common optimisation I've seen is to store the shared prefix in a tree
and then any unique suffix is stored all in one lump rather than as tree
nodes. As similar keys are added, the unique lump gets expanded out.
This way, the work is proportional to the size of the shared key prefix,
not the whole key.)
> However, I can think of a much easier solution which doesn't need any
> rebalancing:
>
> Always create every node so that it partitions the alphabet in two equal
> parts.
Assuming the range of permissible "characters" is small and known prior
to starting, sure.
(You might use this algorithm, e.g., with IP addresses as keys, and then
the key string elements aren't characters any more. But hey, if you're
going to use fixed-width binary data, there are better structures
available...)
Post a reply to this message
|
|