|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
OK, so some of the people here actually learned computer science during
their computer science qualification. (Unlike me, who only learned IT. :-P )
I'm using binary search trees to implement sets. My question is this: Is
there a way to compute a set union, faster than just inserting all the
elements of one set into the other set?
Obviously if all the elements of one set are larger than the elements of
the other, we can just bolt the two trees together (although it might be
necessary to rebalance, and you need to find a root element from somewhere).
Consider, for example, the two trees:
| |
3 4
/ \ / \
1 5 2 6
The union of these is
4
/ \
2 5
/ \ \
1 3 6
(Alternatively, 3 can be the root, yielding a mirror image of the above
tree.)
I'm not seeing any way of building this result that's faster than just
repeatedly inserting the elements of one tree into the other and
rebalancing.
I see two possibilities:
- I'm too stupid to see the shortcut.
- There /is/ not shortcut.
Does anyone know the answer? (I'm sure there's something in Knuth. :-P )
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible <voi### [at] devnull> wrote:
> I'm using binary search trees to implement sets. My question is this: Is
> there a way to compute a set union, faster than just inserting all the
> elements of one set into the other set?
From a computational complexity point of view, no. Rationale: Every time
you insert an element to the tree, the tree has to be re-balanced (which is
necessary because if you don't, it won't be a valid *search* tree anymore,
where every subnode on the left of a node will be smaller or equal to the
node, and each subnode on the right will be larger or equal to the node),
and the fastest you can re-balance a tree is O(log n). Hence the worst case
scenario for inserting one tree into another is O(n log n). There may be
best-case scenarios where a particular tree could be inserted in a particular
other tree faster, but that doesn't change the upper bound (for the worst
case scenario, and probably not even for the average scenario).
While the computational complexity cannot be reduced, if you are looking
at reducing the constant factor (what's colloquially called "hacker
optimization") then you could probably traverse both trees from smallest
to largest nodes linearly, and insert from one tree to the other as
needed. Since in each insertion you don't need to perform a full search
starting from the root node, it might be slightly faster by some constant
factor. (The rebalancing still adds the O(log n) complexity to each
insertion, though, even if you find a way to insert more than one node
with one single re-balancing in some cases.)
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 30/01/2012 04:22 PM, Warp wrote:
> Invisible<voi### [at] devnull> wrote:
>> I'm using binary search trees to implement sets. My question is this: Is
>> there a way to compute a set union, faster than just inserting all the
>> elements of one set into the other set?
>
> From a computational complexity point of view, no.
Well, at least that means it's not that I'm just too stupid to find
it... :-}
> Rationale: Every time
> you insert an element to the tree, the tree has to be re-balanced (which is
> necessary because if you don't, it won't be a valid *search* tree anymore,
You seem to be confusing balance and validity.
A tree is a valid binary search tree if all the values in the left
subtree are lower, and all the values in the right subtree are higher.
That's quite trivial to achieve. A tree is *balanced* if both subtrees
are roughly the right size - and an unbalanced tree usually means lower
performance. (Depending on what you want to do to it. Certainly this is
the case for a BST.)
> and the fastest you can re-balance a tree is O(log n). Hence the worst case
> scenario for inserting one tree into another is O(n log n).
Right, OK.
> There may be
> best-case scenarios where a particular tree could be inserted in a particular
> other tree faster, but that doesn't change the upper bound (for the worst
> case scenario, and probably not even for the average scenario).
Fewer tree manipulations, in exchange for more decision processing.
Looks like it probably wouldn't come out any faster.
> While the computational complexity cannot be reduced, if you are looking
> at reducing the constant factor (what's colloquially called "hacker
> optimization") then you could probably traverse both trees from smallest
> to largest nodes linearly, and insert from one tree to the other as
> needed. Since in each insertion you don't need to perform a full search
> starting from the root node, it might be slightly faster by some constant
> factor. (The rebalancing still adds the O(log n) complexity to each
> insertion, though, even if you find a way to insert more than one node
> with one single re-balancing in some cases.)
Hmm, OK... I'll see what I can come up with.
Presumably set intersection can't easily be accelerated either?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 1/30/2012 8:22 AM, Warp wrote:
> Invisible<voi### [at] devnull> wrote:
>> I'm using binary search trees to implement sets. My question is this: Is
>> there a way to compute a set union, faster than just inserting all the
>> elements of one set into the other set?
>
> From a computational complexity point of view, no. Rationale: Every time
> you insert an element to the tree, the tree has to be re-balanced
I don't follow this reasoning, since you only need the resulting tree to
be a valid BST at the end of the computation, but it can take any
convenient state when you're adding the individual elements. For
instance is there some reason why the following algorithm sketch doesn't
work?
1) Flatten BSTs into sorted arrays / linked lists O(n+m)
2) Merge arrays O(n+m)
3) Rebuild BST from sorted array O(m+n)
That should give you O(m+n) time to perform the union, which is better
than Andrew's suggestion.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 30/01/2012 05:08 PM, Kevin Wampler wrote:
> I don't follow this reasoning, since you only need the resulting tree to
> be a valid BST at the end of the computation, but it can take any
> convenient state when you're adding the individual elements. For
> instance is there some reason why the following algorithm sketch doesn't
> work?
>
> 1) Flatten BSTs into sorted arrays / linked lists O(n+m)
> 2) Merge arrays O(n+m)
> 3) Rebuild BST from sorted array O(m+n)
>
> That should give you O(m+n) time to perform the union, which is better
> than Andrew's suggestion.
Question: If you have a bunch of data in sorted order, how do you build
a BST from that faster than O(n log n) times?
(I can kind of see how that would work if the data is in an array;
arrays support random access, so you could just binary chop the array
and keep the results as a BST. I'm trying to figure out if you can do
anything with a linked list, which does /not/ support random access...)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Le 31/01/2012 10:23, Invisible a écrit :
> On 30/01/2012 05:08 PM, Kevin Wampler wrote:
>
>> I don't follow this reasoning, since you only need the resulting tree to
>> be a valid BST at the end of the computation, but it can take any
>> convenient state when you're adding the individual elements. For
>> instance is there some reason why the following algorithm sketch doesn't
>> work?
>>
>> 1) Flatten BSTs into sorted arrays / linked lists O(n+m)
>> 2) Merge arrays O(n+m)
>> 3) Rebuild BST from sorted array O(m+n)
>>
>> That should give you O(m+n) time to perform the union, which is better
>> than Andrew's suggestion.
>
> Question: If you have a bunch of data in sorted order, how do you build
> a BST from that faster than O(n log n) times?
>
> (I can kind of see how that would work if the data is in an array;
> arrays support random access, so you could just binary chop the array
> and keep the results as a BST. I'm trying to figure out if you can do
> anything with a linked list, which does /not/ support random access...)
Use a model BST, filled with 1 to n+m. (or whatever please you, you just
need n+m values)
Then run in order through the BST, and replace each element with the top
of the list that you remove. (it's not the BST traditional replace
(which might rebalance), just a basic substitution)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> Question: If you have a bunch of data in sorted order, how do you build
>> a BST from that faster than O(n log n) times?
>
> Use a model BST, filled with 1 to n+m. (or whatever please you, you just
> need n+m values)
>
> Then run in order through the BST, and replace each element with the top
> of the list that you remove. (it's not the BST traditional replace
> (which might rebalance), just a basic substitution)
By "in order", I presume you mean from smallest to greatest, rather than
(say) from the top of the tree to the bottom?
Seems to me you need to know the size of the tree before hand.
Notice that if we have two arbitrary sets S and T, then
|S| min |T| <= |S union T| <= |S| + |T|
In other words, we don't know the size of the result set until we
actually construct it.
Is there some way to build the resulting tree incrementally, if you
don't know what its final size should be? (I'm guessing "no", if only
because the result tree needs to be balanced...)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Le 31/01/2012 14:11, Invisible a écrit :
>>> Question: If you have a bunch of data in sorted order, how do you build
>>> a BST from that faster than O(n log n) times?
>>
>> Use a model BST, filled with 1 to n+m. (or whatever please you, you just
>> need n+m values)
>>
>> Then run in order through the BST, and replace each element with the top
>> of the list that you remove. (it's not the BST traditional replace
>> (which might rebalance), just a basic substitution)
>
> By "in order", I presume you mean from smallest to greatest, rather than
> (say) from the top of the tree to the bottom?
>
> Seems to me you need to know the size of the tree before hand.
>
> Notice that if we have two arbitrary sets S and T, then
>
> |S| min |T| <= |S union T| <= |S| + |T|
>
> In other words, we don't know the size of the result set until we
> actually construct it.
Which is the size of the list, so no problem there.
>
> Is there some way to build the resulting tree incrementally, if you
> don't know what its final size should be? (I'm guessing "no", if only
> because the result tree needs to be balanced...)
Make a few study on balanced tree of various size, see if you can get
some pattern based on the number of nodes.
For instance, let N be the number of nodes.
If N = 2x+1, the top level is one node giving access to x on each side.
Which means we know the form for half the number of node (when odd).
5 ? might as well be:
3
/ \
2 4
/ \
1 5
or
3
/ \
1 5
\ /
2 4
or
4
/ \
2 5
/ \
1 3
I let you think how to deal with even number...
--
Software is like dirt - it costs time and money to change it and move it
around.<br/><br/>
Just because you can't see it, it doesn't weigh anything,
and you can't drill a hole in it and stick a rivet into it doesn't mean
it's free.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 1/31/2012 1:23 AM, Invisible wrote:
> Question: If you have a bunch of data in sorted order, how do you build
> a BST from that faster than O(n log n) times?
>
> (I can kind of see how that would work if the data is in an array;
> arrays support random access, so you could just binary chop the array
> and keep the results as a BST. I'm trying to figure out if you can do
> anything with a linked list, which does /not/ support random access...)
Although I haven't done it myself, I believe if you actually do the
complexity analysis you'll find that the lack of random access doesn't
change the complexity of the obvious recursive algorithm except by a
constant factor.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Kevin Wampler <wam### [at] uwashingtonedu> wrote:
> 1) Flatten BSTs into sorted arrays / linked lists O(n+m)
> 2) Merge arrays O(n+m)
> 3) Rebuild BST from sorted array O(m+n)
If there's a way to build a BST from a sorted array on linear time,
you are right. There probably is. It saves the need for rebalancing
after each insertion.
Since elements in a BST usually have pointers to each other anyways,
and since merging two linked lists can be done without requiring any
additional memory, this whole operation might be possible using O(1)
memory. (Although building a BST from a linked list might require
O(log n) memory.)
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|