[R] Decoding subscripts/superscripts from CSVs

Wed Jul 23 14:00:18 CEST 2008

On Tue, 2008-07-22 at 16:18 -0400, naw3 at duke.edu wrote:
> Hi,
> 
> I have a CSV file with various biological reactions. Subscripts, superscripts,
> and italics are encoded in carats, and I was wondering if R can actually
> recognize those and print actual superscripts, etc. Here's an example:
> 
>  <i>S</i>-adenosyl-L-methionine + rRNA  =  <i>S</i>-adenosyl-L-homocysteine +
> rRNA containing <i>N<sup>6</sup></i>-methyladenine
> 
Hi Nina,
Embedded formatting commands enclosed in angle brackets (a caret is ^)
are almost certainly from the SGML family of markup languages and
probably from XML as this is becoming more common as a data format. If
you want to translate the XML to plotmath, you must change the XML tags
to plotmath tags. Here is a toy function for your example:

xml2pm<-function(xmlstring) {
 xmlstring<-gsub("<[iI]>","italic(",xmlstringE)
 xmlstring<-gsub("</[Ii]>",")",xmlstring)
 xmlstring<-gsub("<[Ss][Uu][Pp]>","^",xmlstring)
 xmlstring<-gsub("</[Ss][Uu][Pp]>","",xmlstring)
 return(xmlstring)
}

Jim