Tree matching using serialization of tree and unique id generation for each subtree

StackOverflow https://stackoverflow.com/questions/694132

  •  22-08-2019
  •  | 
  •  

Question

Binary tree http://img9.imageshack.us/img9/9981/binarytree.jpg

What would be the best way to serialize a given binary tree and inturn evaluate a unique id for each serialized binary tree?

For example, I need to serialize the sub-tree (2,7,(5,6,11)) and generate a unique id 'x' representing that sub-tree so that whenever I come across a similar sub-tree (2,7,(5,6,11)) it would serialize to the same value 'x' and hence I can deduce that I've found a match.

Here we assume that each node has properties that are unique. In the above example, it would be the numbers assigned to each node and hence they would always generate the same ids for similar sub-trees. I'm trying to do this in C++.

Do algorithms already exist to perform such serialized tree matching?

Was it helpful?

Solution

Do you want to to be able match any arbitrary part of the tree or a subtree running upto some leaf node(s)? IIUC, you are looking at suffix matching.

You can also look at Compact Directed Acyclic Word Graph for ideas.

OTHER TIPS

I would make a hash value (in some Rabin-Karp fashion) based on the nodes' IDs and position in the tree, ie:

long h = 0
for each node in sub tree:
    h ^= node.id << (node.depth % 30)

in pseudo code. The downside is that different subtrees may yield the same hash value. But the advantage is that it is fast to compare hash values, and when match is found you can further investige the actual sub tree for equality.

If you're not looking for high efficiency, you might want to use a very simple depth-first-search algorithm.

"2,7,2,U,6,5,U,11,U,U,U,5,9,4"

As you can see, i added U commands ("up") so as to show where the next child would be created. Of course you can make this more efficient, but i believe that's a start.

Also, you might want to have a look at Boost.Graph (BGL) for implementation.

What's wrong with the parentheses notation like you used in your question?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top