Question

What are the maximum and minimum number of nodes in a suffix tree? And how can I prove it?

Était-ce utile?

La solution

Assuming an input text of N characters in length, the minimum number of nodes, including the root node and all leaf nodes, is N+1, the maximum number of nodes, including the root and leaves, is 2N-1.

Proof of minimum: There must be at least one leaf node for every suffix, and there are N suffixes. There need not be any inner nodes, example: if the text is a sequence of unique symbols, abc$, there are no branches, hence no inner nodes in the resulting suffix tree:

enter image description here

Hence the minimum is N leaves, 0 inner nodes, and 1 root node, in sum N+1 nodes.

Proof of maximum: The number of leaf nodes can never be larger than N, because a leaf node is where a suffix ends, and you can't have more than N distinct suffixes in a string of length N. (In fact, you always have exactly N distinct suffixes, hence N leaf nodes exactly.) The root node is always exactly 1, so the question is what is the maximum number of inner nodes. Every inner node introduces a branch in the tree (because inner nodes of a suffix tree have at least 2 children). Each new branch must eventually lead to at least one extra leaf node, so if you have K inner nodes, there must be at least K+1 leaf nodes, and the presence of the root node requires at least one additional leaf (unless the tree is empty). But the number of leaf nodes is bounded by N, so the maximum number of inner nodes is bounded by N-2. This yields exactly N leaves, 1 root, and a maximum of N-2 inner nodes, 2N-1 in total.

To see that this is not only a theoretical upper bound, but some suffix trees actually reach this maximum, consider as an example a string with just one repeated character: 'aaa$'. Confirm that the suffix tree for this has 7 nodes (including root and leaves):

enter image description here

Summary: As evident, the only real variable is the number of inner nodes; the number of roots and leaves is constant at 1 and N for all suffix trees, while the number of inner nodes varies between 0 and N-2.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top