[R] Split column of concatenated data
Street N.R.
N.R.STREET at soton.ac.uk
Wed Apr 25 16:06:54 CEST 2007
Hi
I have a column of concatenated information stored in an RG object in the limma package and I need to split this information and then paste the first two pieces of data in each case back into two columns of the RG object.
This is how I am currently doing this
gene.info.split<-strsplit(RG$genes$Name,",",fixed=TRUE)
for (h in 1: length(gene.info.split)){
RG$genes$ID[h]<-gene.info.split[h][[1]][1]
RG$genes$Name[h]<-gene.info.split[h][[1]][2]
}
However, this is very slow and presumably 'messy'. The problem is that there are an inconsistent number of comma separated entries in the original Name column so I cannot do
gene.info.split<-as.data.frame(strsplit(RG$genes$Name,",",fixed=TRUE))
because I get the error message
Error in data.frame(c("OligoCy3", "SP Control poplar 48pin", "A24", "no length information" :
arguments imply differing number of rows: 4, 6, 5, 1
I also can't figure out how to usefully put the [list] data into a matrix (my ignorance I am sure).
Ideally I would be able to put each comma separated item into a column and then simply paste the first and second columns over the RG$genes$Name and RG$genes$ID columns respectively (and do away with the for loop).
Some cases in the original RG$genes$Name has only one piece of information (ie no commas) so I would need a way to fill any blanks with an NA value
If anyone can help me, it would be much appreciated
Nat Street
---
Nathaniel Street
University of Southampton
Plants and Environment Lab
School of Biological Sciences
Basset Crescent East
Southampton
SO16 7PX
tel: +44 (0) 2380 594268
fax: +44 (0) 2380 594269
n.r.street at soton.ac.uk
http://www.populus.biol.soton.ac.uk/~nat
http://del.icio.us/n.r.street
More information about the R-help
mailing list