Question

I need to get all text between <Annotation> and </Annotation>, where a word MATCH occurs. How can I do it in VIM?

<Annotation about="MATCH UNTIL </Annotation>   " timestamp="0x000463e92263dd4a" href="     5raS5maS90ZWh0YXZha29rb2VsbWEvbGFza2FyaS8QyrqPk5L9mAI">                                                                        
  <Label name="las" />
  <Label name="_cse_6sbbohxmd_c" />
  <AdditionalData attribute="original_url" value="MATCH UNTIL </Annotation>       " />
</Annotation>
<Annotation about="NO MATCH" href="     Cjl3aWtpLmhlbHNpbmtpLmZpL2Rpc3BsYXkvbWF0aHN0YXRLdXJzc2l0L0thaWtraStrdXJzc2l0LyoQh_HGoJH9mAI">
  <Label name="_cse_6sbbohxmd_c" />
  <Label name="courses" />
  <Label name="kurssit" />
  <AdditionalData attribute="original_url" value="NO MATCH" />
</Annotation>
<Annotation about="MATCH UNTIL </ANNOTATION>     " score="1" timestamp="0x000463e90f8eed5c" href="CiZtYXRoc3RhdC5oZWx     zaW5raS5maS90ZWh0YXZha29rb2VsbWEvKhDc2rv8kP2YAg">
  <Label name="_cse_6sbbohxmd_c" />
  <Label name="exercises_without_solutions" />
  <Label name="tehtäväkokoelma" />
  <AdditionalData attribute="original_url" value="MATCH UNTIL </ANNOTATION>" />
</Annotation>
Was it helpful?

Solution

Does it have to be done within vim? Could you cheat, and open a second window where you pipe something into more/less that tells you what line number to go to within vim?

-- edit --

I have never done a multi-line match/search in vi[m]. However, to cheat in another window:

perl -n -e 'if ( /<tag/ .. /<\/tag/)' -e '{ print "$.:$_"; }' file.xml | less

will show the elements/blocks for "tag" (or other longer matching names), with line numbers, in less, and you can then search for the other text within each block.

Close enough?

-- edit --

within "less", type

/MATCH

to search for occurrences of MATCH. On the left margin will be the line number where that instance (within the targeted element/tags) is.

within vi[m], type

:n

where "n" is the desired line number.

Of course, if what you really wanted to do was some kind of search/yank/replace, it's more complicated. At that point, awk / perl / ruby (or something similar which meets your tastes ... or xsl?) is really the tool you should be using for the transformation.

OTHER TIPS

First, a disclaimer: Any attempt to slice and dice XML with regular expressions is fragile; a real XML parser would do better.

The pattern:

\(<Annotation\(\s*\w\+="[^"]\{-}"\s\{-}\)*>\)\@<=\(\(<\/Annotation\)\@!\_.\)\{-}"MATCH\_.\{-}\(<\/Annotation>\)\@=

Let's break it down...

Group 1 is <Annotation\(\s*\w\+="[^"]\{-}"\s\{-}\)*>. It matches the start-tag of the Attribute element. Group 2, which is embedded in Group 1, matches an attribute and may be repeated 0 or more times.

Group 2 is \s*\w\+="[^"]\{-}"\s\{-}. Most of these pieces are commonly used; the most unusual is \{-}, which means non-greedy repetition (*? in Perl-compatible regular expressions). The non-greedy whitespace match at the end is important for performance; without it, Vim will try every possible way to split the whitespace between attributes between the \s* at the end of Group 2 and the \s* at the beginning of the next occurrence of Group 2.

Group 1 is followed by \@<=. This is a zero-width positive look-behind. It prevents the start-tag from being included in the matched text (e.g., for s///).

Group 3 is \(<\/Annotation\)\@!\_.. It includes Group 4, which matches the beginning of the Attribute end-tag. The \@! is a zero-width negative look-ahead and \_. matches any character (including newlines). Together, this groups matches at any character except where the Attribute end-tag starts. Group 3 is followed by a non-greedy repetition marker \{-} so that it matches the smallest block of text before MATCH. If you were to use \_. instead of Group 3, the matched text could include the end-tag of an Annotation element that did not include MATCH and continue through into the next Annotation element with MATCH. (Try it.)

The next bit is straightforward: Find MATCH and a minimal number of other characters before the end-tag.

Group 5 is easy: It's the end tag. \@= is a zero-width positive look-ahead, which is included here for the same reason as the \@<= for the start-tag. We have to repeat <\/Attribute rather than use \4 because groups with zero-width modifiers aren't captured.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top