Question

How can I, in a MySQL query, have the same behaviour as the Regex.Replace function (for instance in .NET/C#)?

I need that because, as many people, I would like to count the number of words in a field. However, I'm not satisfied with the following answer (given several times on that site):

SELECT LENGTH(name) - LENGTH(REPLACE(name, ' ', '') +1 FROM table

Because it doesn't give good results when there are more that one space between two words.

By the way, I think the Regex.Replace function may be interesting so all the good ideas are welcome !

Was it helpful?

Solution

There's REGEXP_REPLACE available as MySQL user-defined functions.

Word counting: If you can control the data going into the database, you can remove double whitespace before insert. Also if you have to access the word count often, you can compute it once in your code and store the count in the database.

OTHER TIPS

UPDATE: Have now added a separate answer for MySQL 8.0+, which should be used in preference. (Retained this answer in case of being constrainted to using an earlier version.)

Almost a duplicate of this question but this answer will address the use case of counting words based on the advanced version of the custom regular expression replacer from this blog post.

Demo

Rextester online demo

For the sample text, this gives a count of 61 - the same as all online word counters I've tried (e.g. https://wordcounter.net/).

SQL (excluding function code for brevity):

SELECT txt,
       -- Count the number of gaps between words
       CHAR_LENGTH(txt) -
       CHAR_LENGTH(reg_replace(txt,
                               '[[:space:]]+', -- Look for a chunk of whitespace
                               '^.', -- Replace the first character from the chunk
                               '',   -- Replace with nothing (i.e. remove the character)
                               TRUE, -- Greedy matching
                               1,  -- Minimum match length
                               0,  -- No maximum match length
                               1,  -- Minimum sub-match length
                               0   -- No maximum sub-match length
                               ))
       + 1 -- The word count is 1 more than the number of gaps between words
       - IF (txt REGEXP '^[[:space:]]', 1, 0) -- Exclude whitespace at the start from count
       - IF (txt REGEXP '[[:space:]]$', 1, 0) -- Exclude whitespace at the end from count
       AS `word count`
FROM tbl;

The answer is no you cannot have the same behaviour in MySQL.

But i recommend you checkout this earlier question on the subject which links to a UDF that supposedly enables some of this functionality.

MySQL 8.0 now provides a decent REGEXP_REPLACE function, which makes this much simpler:

SQL

SELECT -- Count the number of gaps between words
       CHAR_LENGTH(txt) -
           CHAR_LENGTH(REGEXP_REPLACE(
               txt,
               '[[:space:]]([[:space:]]*)', -- A chunk of one or more whitespace characters
               '$1')) -- Discard the first whitespace character and retain the rest
           + 1 -- The word count is 1 more than the number of gaps between words
           - IF (txt REGEXP '^[[:space:]]', 1, 0) -- Exclude whitespace at the start from count
           - IF (txt REGEXP '[[:space:]]$', 1, 0) -- Exclude whitespace at the end from count
           AS `Word count`
FROM tbl;

Demo

DB-Fiddle online demo

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top