SQL word root matching

https://stackoverflow.com/questions/4051572

27-09-2019
|

Question

I'm wondering whether major SQL engines out there (MS SQL, Oracle, MySQL) have the ability to understand that 2 words are related because they share the same root.

We know it's easy to match "networking" when searching for "network" because the latter is a substring of the former.

But do SQL engines have functions that can match "network" when searching for "networking"?

Thanks a lot.

Solution

This functionality is called a stemmer: an algorithm that can deduce a stem from any form of the word.

This can be quite complex: for instance, Russian words шёл and иду are different forms of the same verb, though they have not a single common letter (ironically, this is also true for English: went and go).

Word breaking can also be quite a complex task for some languages that use no spaces between words.

SQL Server allows using pluggable stemmers and word breakers for its fulltext search engine:

http://msdn.microsoft.com/en-us/library/ms142509.aspx

OTHER TIPS

I think the topic is 'Semantic Similarity'. There are several efforts trying to find optimal solutions to this problem.

You can try using soundex, though it might not be exactly what you want. See http://www.codeproject.com/KB/database/Phonetic_Search_MSSQL.aspx.

As Quassnoi pointed out, this can be done with stemming. PostgreSQL implements it for full-text search if you turn it on.

ALTER TEXT SEARCH CONFIGURATION blah_en ADD MAPPING FOR english_stem;

This uses the Snowball dictionary, which is based on the Porter stemmer. The Porter stemmer is probably one of the most widely used stemmers, so it will give decent results. It's important to remember, though, that stemming is not always as accurate as you might like.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow