Pergunta

Are there an efficient algorithm to search and dump all common substrings (which length is 3 or longer) between 2 strings?

Example input:

Length   : 0    5    10   15   20   25   30

String 1 : ABC-DEF-GHI-JKL-ABC-ABC-STU-MWX-Y
String 2 : ABC-JKL-MNO-ABC-DEF-PQR-DEF-ZWX-Y

Example output:

In string 1           2
---------------------------
ABC-DEF   0           12
ABC-DE    0           12
BC-DEF    1           13
:
-ABC-     15,19       11
-JKL-     11          3
-DEF-     3           15
-JKL      11          3
JKL-      12          4
-DEF      3           15,23
DEF-      4           16
WX-Y      29          29
ABC-      0,16,20     0,12
-ABC      15,19       11
DEF-      4           16,24
DEF       4           16,24
ABC       0,16,20     0,12
JKL       12          4
WX-       29          29
X-Y       30          30
-AB       15,19       11
BC-       1,17,21     1,13
-DE       3           15,23
EF-       5           17,25
-JK       11          3
KL-       13          5
:

In the example, "-D", "-M" is also a common substring but is not required, because it's length is only 2. (There might be some missing outputs in example because there are so many of them...)

Foi útil?

Solução

You can find all common substrings using a data structure called a Generalized suffix tree

Libstree contains some example code for finding the longest common substring. That example code can be modified to obtain all common substrings.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top