|
 |
Warp wrote:
> Darren New <dne### [at] san rr com> wrote:
>> I think the way the original paper worked it is to store the '\0' as part of
>> the string in the tree. So your CAT vs CATS would be
>> -->C-->A-->T-->S-->\0
>> |
>> v
>> \0
>
> But that's not a null pointer indicating the end of the string. That's
> a regular node containing a special character which indicates the end of
> the string. Implementing my automatically-balanced tree creation idea can
> use the exact same method for marking ends of strings.
Then we're in agreement. I was using \0 to indicate ASCII NUL, not C's NULL
pointer. You can either use a node whose character value is '\0', or you
can use a separate bit, depending on your needs.
> Btw, the node containing an end-of-string value idea works only as long
> as there's a special character value which never appears in a string. It
> can't be done like that if the range of values of each character in your
> "string" can hold is the full 0-255 range. Of course you could use values
> larger than 8 bits, but that's just basically the exact same thing as having
> an end-of-string bit at each node.
Yep. Or you could have another pointer that points to the data that is
associated with that string (i.e., the "value" of the "key/value" pair), and
if that's not NULL, then a string ends at that node.
> 16 bytes to store one single character sounds quite a lot... :)
And a fourth pointer for the value associated, yes. Of course, any tree
structure is going to be close to the same size too. If you have lots and
lots of strings, and you're more worried about space than speed, a hash
table works out better, I expect.
> Of course strings with equal beginnings will share nodes, but if the
> strings are very diverse, they will still end up using quite a lot of
> memory.
If you're using up enough memory to be worried about it, chances are you
have enough strings that the beginnings will overlap a lot? I mean, unless
you're really memory-constrained.
> In fact, since no character can be smaller than '\0', you could also
> have additional '\0' nodes on the left branch, each one pointing to
> associated data. That way you can have a multiset, ie. the same key
> pointing to multiple data, rather than just one.
Hmmmmm. Well, certainly the '\0' node could point to a different structure
than the other nodes. You already have three pointers there to play with,
and you could reasonably hang an entire tree underneath if you wanted.
--
Darren New, San Diego CA, USA (PST)
Insanity is a small city on the western
border of the State of Mind.
Post a reply to this message
|
 |