It appears this that nltk
right-strips whitespace in strings before applying the regex.
See the source code (or you could import inspect
and print inspect.get_source(nltk.re_show)
)
def re_show(regexp, string, left="{", right="}"):
"""docstring here -- I stripped it for brevity"""
print(re.compile(regexp, re.M).sub(left + r"\g<0>" + right, string.rstrip()))
In particular, see the string.rstrip()
, which strips all trailing whitespace.
For example, if you make sure that your phillip
string does not have a space to the right:
nltk.re_show('\w+|[^\w]+', phillip + '.')
# {#}{awesome}{ .}
Not sure why nltk
would do this, it seems like a bug to me...