[R] gsub: replacing a.*a if no occurence of b in .*

Ulrich Keller ulrich.keller at emacs.lu
Sat Feb 24 12:47:52 CET 2007


I am trying to read a number of XML files using xmlTreeParse(). Unfortunately,
some of them are malformed in a way that makes R crash. The problem is that
closing tags are sometimes repeated like this:

<tag>value1</tag><tag>value2</tag>some garbage</tag></tag><tag>value3</tag>

I want to preprocess the contents of the XML file using gsub() before feeding
them to xmlTreeParse() to clean them up, but I can't figure out how to do it.
What I need is something that transforms the example above into:

<tag>value1</tag><tag>value2</tag><tag>value3</tag>

Some kind of "</tag>.*</tag>" that only matches if there is no "<tag>" in ".*".

Thanks in advance for you ideas,

Uli



More information about the R-help mailing list