|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> One problem which I see with the tree building algorithm presented there
> is that the tree is not balanced. The geometry of the tree will heavily
> depend on which order you insert the strings into it. A suboptimal insertion
> order will cause the tree to be heavily unbalanced.
As far as I can tell, it's only the letter selection part of the tree
which can be unbalanced. E.g., if you have a million strings, you won't
ever have to do a million comparisons to find a given string. You might
end up doing 25 comparisons per character or something, but you'll never
need to do a million.
(A common optimisation I've seen is to store the shared prefix in a tree
and then any unique suffix is stored all in one lump rather than as tree
nodes. As similar keys are added, the unique lump gets expanded out.
This way, the work is proportional to the size of the shared key prefix,
not the whole key.)
> However, I can think of a much easier solution which doesn't need any
> rebalancing:
>
> Always create every node so that it partitions the alphabet in two equal
> parts.
Assuming the range of permissible "characters" is small and known prior
to starting, sure.
(You might use this algorithm, e.g., with IP addresses as keys, and then
the key string elements aren't characters any more. But hey, if you're
going to use fixed-width binary data, there are better structures
available...)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Darren New wrote:
> http://www.pcplus.co.uk/node/3074/
I just implemented it... partial matching and all. ;-)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible <voi### [at] devnull> wrote:
> As far as I can tell, it's only the letter selection part of the tree
> which can be unbalanced. E.g., if you have a million strings, you won't
> ever have to do a million comparisons to find a given string. You might
> end up doing 25 comparisons per character or something, but you'll never
> need to do a million.
But if the strings themselves are very long, then it can make a
significant difference whether you have to make 25 comparisons per
character or 5.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> Invisible <voi### [at] devnull> wrote:
>> As far as I can tell, it's only the letter selection part of the tree
>> which can be unbalanced. E.g., if you have a million strings, you won't
>> ever have to do a million comparisons to find a given string. You might
>> end up doing 25 comparisons per character or something, but you'll never
>> need to do a million.
>
> But if the strings themselves are very long, then it can make a
> significant difference whether you have to make 25 comparisons per
> character or 5.
Sure. I'm just saying, if the number of comparisons were dependent on
the number of keys, you'd be in a whole other *complexity class*. That
puts "really slow" into perspective.
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> http://www.pcplus.co.uk/node/3074/
>
> I just implemented it... partial matching and all. ;-)
...and then somebody on the Haskell mailing list announces that they're
releasing this on Hackage. Go figure!
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |