Ruby list of tags to a fluent regex
문제
I want to clean an HTML page of its tags, using Ruby. I have the raw HTML, and would like to define a list of tags, e.g. ['span', 'li', 'div'], and create an array of regular expressions that I could run sequentially, so that I have
clean_text = raw.gsub(first_regex,' ').gsub(second_regex,' ')...
with two regular expressions per tag (start and end).
Do I have a way to do this programmatically (i.e. pre-build the regex array from a tag array and then run them in a fluent pattern)?
EDIT: I realize I actually asked two questions at once - The first about transforming a list of tags to a list of regular expressions, and the second about calling a list of regular expressions as a fluent. Thanks for answering both questions. I will try to make my next questions single-themed.
해결책
This should produce a single regexp to remove all your tags.
clean_text = raw.gsub(/<\/?(#{tags.join("|")})>/, '')
However, you have to improve it to support tags with attributes (e.g. <a href="...">), currently only simple tags are removed (e.g. <a>)
다른 팁
Assuming you have a build_regex
method to turn a tag into a regex, this should do it:
tags = %w(span div li)
clean_text = tags.inject(raw) {|text, tag| text.gsub build_regex(tag), ' ' }
The inject call passes the result of each substitution into the next iteration of the block, giving the effect of running each gsub on the string one by one.