|
 |
Warp wrote:
> Darren New <dne### [at] san rr com> wrote:
>> I think that would work with the caveat that you have to distinguish between
>> nodes without a child and nodes whose only child is '\0'. That is, not all
>> your branches will end in a '\0' as shown in the picture.
>
> Each node needs a bit of information denoting whether it's an end node
> or not (ie. whether that node is the last character of a string stored in
> the data structure or not) regardless. For example, if you store "CAT" and
> "CATS" as two separate strings, you'll need a bit in the 'T' node to
> indicate that there exists a string which end there.
I think the way the original paper worked it is to store the '\0' as part of
the string in the tree. So your CAT vs CATS would be
-->C-->A-->T-->S-->\0
|
v
\0
You could of course represent it as a separate bit instead.
> Doing it like I suggested doesn't change anything. For the extraneous
> nodes you simply set the bit to "no string ends here".
Yes, that's another way to do it. However, if you want to associate some
data with each string (e.g., you're using it in place of a hashtable map,
instead of in place of a list of strings like a spelling checker might use),
you'll want a place to store the pointer associated with the "string ends
here" anyway, and it might be more space-efficient to have a '\0' node to
tack that pointer onto than it would be to have that pointer on every
character whether it's the end of a string or not. If you have '\0' nodes,
you could special-case the "next character" (i.e., middle) pointer to point
to the data you want to associate with the word.
I wonder how ugly this would be to implement functionally.
--
Darren New, San Diego CA, USA (PST)
Insanity is a small city on the western
border of the State of Mind.
Post a reply to this message
|
 |