[R] gsub: replacing a.*a if no occurence of b in .*
Ulrich Keller
ulrich.keller at emacs.lu
Sat Feb 24 12:47:52 CET 2007
I am trying to read a number of XML files using xmlTreeParse(). Unfortunately,
some of them are malformed in a way that makes R crash. The problem is that
closing tags are sometimes repeated like this:
<tag>value1</tag><tag>value2</tag>some garbage</tag></tag><tag>value3</tag>
I want to preprocess the contents of the XML file using gsub() before feeding
them to xmlTreeParse() to clean them up, but I can't figure out how to do it.
What I need is something that transforms the example above into:
<tag>value1</tag><tag>value2</tag><tag>value3</tag>
Some kind of "</tag>.*</tag>" that only matches if there is no "<tag>" in ".*".
Thanks in advance for you ideas,
Uli
More information about the R-help
mailing list