Journal Entry for Wednesday, June 20, 2001

10001110100110101

Sun	Mon	Tue	Wed	Thu	Fri	Sat
3	4	5*	6*	7*	8	9
10*	11*	12*	13	14	15*	16
17*	18	19*	[20]*	21	22*	23
24*	25	26*	27	28*	29*	30
1	2	3	4	5*	6	7

[3:19 PM EDT - Are you single?]

WTF am I doing?!? Maybe I lost a few brain cells diving for that disc on Monday. Man, I think I'm getting more and more blunt in my old age..

Wednesday, June 20, 2001 at 20:19:18 (UTC)

hey you computer science types!
here's a problem - it might be solved brilliantly somewhere? but I don't know!
suppose I have a (big) binary tree with non unique nodes, and I have (presumably) smaller binary tree that I want to compare against the big tree to see if there are any matching subtrees. What's the best way to do this?

girl

Wednesday, June 20, 2001 at 20:21:03 (UTC)

btw, I've been getting many wonderful birthday greetings (except Hwan's doesn't exactly fit the wonderful category - but I appreciate it nonetheless!)
thanks everyone and thanks to the small feathered messenger! : )

girl

Thursday, June 21, 2001 at 01:35:02 (UTC)

There are two ways I can think of to do this, off the top of my head.

1) Do a brute-force compare. Traverse the big tree, and for every node, compare each sub-tree against the smaller tree by traversing both, comparing each node. Sounds like a lot of work, but it doesn't look to be much more than O(n) where n is the size of the big tree.

2) Create an formula which assigns an integer value to a tree (add up all the values, or sum(node^depth) is better). Calculate the value for the smaller tree, and traverse the bigger tree, calculating the value for each subtree. Compare the integers. If there's a match, do a node-by-node compare.

1) is probably better, unless you can think of clever optimizations that my addled brain can't at the moment. I think 2) approaches O(n^2). O(n) is probably optimal.

FlyingS

Thursday, June 21, 2001 at 16:03:33 (UTC)

I'm not a "computer science type" but...
The optimal answer probably involves finding a matching pattern of "layers" rather than a pattern of disembodied nodes. Consider that in a binary tree, every node has three connections: 1 to the layer above (call it X), and 2 to the layer below (Y and Z). We have B a large tree, and b a small subtree to be pattern-matched.
0- Start at the leftmost node of the bottom-most layer of b (node index (1, d), where d is the depth of b)
1- Scan the Dth layer of B for a matching node value (ie compare (1,d) with (?,D))
2- For all matching nodes on layer D of B, match connection X with that of d, discarding the nodes on B that do not match
3- Decrement D and d by 1. In the b tree, start with the value of connection X now (ie go up one layer in b, but remain at the leftmost node). Go to 1.

Eventually, you will have a set of nodes on B that all agree from bottom to top with b*, as far as the left-most nodes are concerned. To decide among these candidates for a pattern-match, step 0 can now be changed to the second-leftmost node and recursed. The third pass will scan starting from the third-most node. And so on.

I don't think this is a very expensive scan, as there is always a chance that you will find the subtree in question withuot having to compare the entire tree, because you might end up with only one candidate on B early on. Of course, to make sure, you need to do a full comparison of your remaining candidate to guarantee that it IS the subtree you're looking for.

*Actually, there's no guarantee that b's pattern will indeed be found among these candidates. b's pattern may be located much higher up in the tree... but we stopped scanning because we couldn't decrement d any more. Still, if it turns out to be deep in B, we saved a lot of time! If not, we just decrement D some more.

I hope this fits in QYV's comment field... it's hard to describe without using an actual computer language. I COULD use Xpath, actually... but I don't know if anybody knows its lingo (parent, child, siblings, blabla).

reg_at_verk <e-mail>

Thursday, June 21, 2001 at 16:13:16 (UTC)

I betcha my (probabilistic) algorithm is O(log2n)
or maybe log2O(n).
You're right... I never took courses to classify algorithmic complexity/execution time! hehe

reg_at_verk <e-mail>

Thursday, June 21, 2001 at 16:17:23 (UTC)

Actually, this would've been an excellant forum topic. Too bad my forum's still incomplete. *taps on construction sign* I guess I'll be working on that "soon".

QYV

Thursday, June 21, 2001 at 16:20:20 (UTC)

OK third comment on this, then it's back to work!
Intuitively, how can the operating time of ANY algorithm to solve this depend ONLY on n, the size of the large tree? I mean, probabilistically and from a pattern-matching point of view, the larger the size of the small tree, the more likely and quickly it will be recognized within the large tree.
so the algorithm shuold depend on n and m (size of the small tree). At least, if your algorithm isn't just a brute-force node-by-node compare. If it were just a brute-force thing, then it would depend on n+m, really, which can be lumped into T, the total size of the two trees. For some reason, this stuff fascinates me! *blink* OK i'll shut up now.

reg_at_verk <e-mail>

Thursday, June 21, 2001 at 18:28:23 (UTC)

... it might work, except that a "binary tree" doesn't have links up the tree. You're also assuming that the tree is balanced (how do you find the left-most node on the bottom-most layer?). It's pretty rare that a binary tree problem isn't going to be solved best by a recursive algorithm.

big-O algorithm guestimation is, as far as I'm concerned, a bit of a black art. I don't spend too much time on it. (I'm also kinda wrong: my claim of "a bit more than O(n)" above is only true in the average case). What it talks about is how execution time (or space or whatever) grows for large values of n. Say if the precise formula for measuring the time taken by the algorithm is t(n,m) = 17n^2 + 24n + 45000m + 640, t(n,m) is O(n^2). You can give examples to muck things up, but that's kind of missing the point. You're estimating algorithm efficiency. In doing so, you've got to boil it down to what's most important, and in the simplest terms possible (and choose wisely...).

FlyingS

Friday, June 22, 2001 at 13:40:57 (UTC)

Doh!... I admit it, I did assume a balanced binary tree.
I even have a drawing of it (pyramid-like) in my notebook here. I was trying to figure out the algorithm with the CEO here droning on about our "value proposition". *blink*
oh well, I had fun working out stuff anyway. So, girl, what's the answer!?!?!

reg_at_verk <e-mail>

Friday, June 29, 2001 at 15:07:35 (UTC)

whew! thanks guys!! I think I'm gonna go with FlyingS's #1. Are you sure it's only order N (size of large tree)? Wouldn't it be closer to something like O(N*M) ?
sorry to distract you at work, reg! heehee.. CEOs should realize how boring they are anyways!

girl

Friday, June 29, 2001 at 15:42:07 (UTC)

Nope (with the proviso that I'm talking in the average case, with fairly random data). 'Cuz once you find a node that doesn't match, you stop comparing that subtree. On average, it's not likely you'll have too many matches... effectively O(1).

FlyingS

Friday, June 29, 2001 at 17:30:08 (UTC)

So then would it be something like:
Worst Case: O(N*M)
Avg Case: O(N)
Best Case: O(1)
?
I did this in first year, but I distinctly remember not likin' it! :)

girl

Friday, June 29, 2001 at 18:47:25 (UTC)

Yup, basically.

Alternatively, if m really is "small," can reasonably considered constant, or simply doesn't interest us (probably for one of the above two reasons), we can ignore it, making it O(n) in the best case, as well.

Which makes me wonder where I came up with the O(n^2) nonsense... I did say this was a black art, didn't I? ^_^

FlyingS

Friday, June 29, 2001 at 18:48:45 (UTC)

(Actually, no, you don't get O(1) in the best case. You've got to traverse the tree (comparing all n elements) at least once)

FlyingS

Monday, July 14, 2025 @ 16:38:18 EDT

« List of pages on this site:

« List of recent entries:

Nov. 11, 2011 - Dream Sequence: Visitin..
Oct. 30, 2011 - We Arr Pirates!
Oct. 12, 2011 - Skipping badminton.
Oct. 2, 2011 - A new month. What did I..
Sep. 24, 2011 - Riding out the invites.

« List of recent comments:

Apr. 13, 2011 - QYV
Apr. 9, 2011 - QYV
Apr. 13, 2011 - Dr. HeKyLl
Apr. 13, 2011 - Dangerman
Apr. 9, 2011 - Dangerman

« List of recent links:

« List of random quotes:

"I love being a writer. What I can't stand is the paperwork."

Peter De Vries (From The Quotations Page.)