Question

I've implemented an algorithm to construct a Suffix Tree. Now, I'm trying to implement a method count that returns the number of times query occurs as a sublist/subinterval of the reference sequence. What's the best way to do that?

Example:

suffix tree for sequence

1,2,50,100,25,25,25,50,100,25,25 

query

25,25

result

3
Était-ce utile?

La solution

One approach is:

  1. Add a unique terminating symbol to the list (e.g. -1).

  2. Construct the suffix tree.

  3. Now walk down the suffix tree based on the numbers in the query.

  4. If this is impossible then the query appears 0 times.

  5. Otherwise, count the leaf nodes in the subtree based at your current position.

The number of times the query appears in the string is equal to the number of leaf nodes in the subtree.

If you wish to do several queries then you can use a depth first search to count the number of leaf nodes in O(n) and store the answers in each node. This will then let you perform queries in time O(k) where k is the length of your query string.

This works because your suffix tree will have leaf nodes for each of the suffixes:

1,2,50,100,25,25,25,50,100,25,25
2,50,100,25,25,25,50,100,25,25
50,100,25,25,25,50,100,25,25
100,25,25,25,50,100,25,25
25,25,25,50,100,25,25
25,25,50,100,25,25
25,50,100,25,25
50,100,25,25
100,25,25
25,25
25

of these, after you follow the 25,25 query down the tree, the remaining leaf nodes in the subtree correspond to:

25,25,25,50,100,25,25
25,25,50,100,25,25
25,25

which gives a count of 3 times for the query in the string.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top