Domanda

I'm having a lot of trouble writing this regular expression:

(?<=\s+|^\s*|\(\s*|\.)(?:item|item1|item2)(?=\s+|\s*$|\s*\)|\.)

It works very well on my regex editor (Expresso) and in the .NET environment, but in the Java environment (JRE 1.6.0.25 using Eclipse Helios R2) it doesn't work because the Pattern.compile() method throws a "Syntax error U_REGEX_LOOK_BEHIND_LIMIT" exception.

That's because the look behind pattern (?<=\s+|^\s*|\(\s*|\.) must have a defined limit (unlimited quantifiers such as * and + are not allowed here as far as I know).

I also tried to specify the range of repetition in this way with no luck:

(?<=\s{0,1000}|^\s{0,1000}|\(\s{0,1000}|\.)(?:item|item1|item2)(?=\s+|\s*$|\s*\)|\.)

So, how can I write an identical regex that works even on Java environment? I can't believe that there's no workaround for this kind of common situation....

È stato utile?

Soluzione

Keep in mind that the lookbehind will only look as far behind as it must. For example, (?<=\s+) will be satisfied if the previous character is a space; it doesn't need to look any farther back.

The same is true of your lookbehind. If it's not the beginning of the string and the previous character is not whitespace, an open-parenthesis or a period, there's no point looking any farther back. It's equivalent to this:

(?<=^|[\s(.])

Your lookahead can be condensed in the same way. If it's not the end of the string, and the next character is not whitespace, a close-parenthesis or a period, there's no point looking any further:

(?=[\s).]|$)

So the final regex is:

(?<=^|[\s(.])(?:item|item1|item2)(?=[\s).]|$)
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top