how to split row elements [1] and [2] of a string variable A via srtsplit and sapply
jim holtman
jholtman at gmail.com
Thu Sep 10 20:05:01 CEST 2015
try this:
> x <- read.table(text = "A B
+ 1:29439275 0.46773514
+ 5:85928892 0.81283052
+ 10:128341232 0.09332543
+ 1:106024283:ID 0.36307805
+ 3:62707519 0.42657952
+ 2:80464120 0.89125094", header = TRUE, as.is = TRUE)
>
> temp <- strsplit(x$A, ":")
> x$C <- sapply(temp, '[[', 1)
> x$D <- sapply(temp, '[[', 2)
>
> x
A B C D
1 1:29439275 0.46773514 1 29439275
2 5:85928892 0.81283052 5 85928892
3 10:128341232 0.09332543 10 128341232
4 1:106024283:ID 0.36307805 1 106024283
5 3:62707519 0.42657952 3 62707519
6 2:80464120 0.89125094 2 80464120
On Thu, Sep 10, 2015 at 1:46 PM, aldi <aldi at wustl.edu> wrote:
> Hi,
> I have a data.frame x1, of which a variable A needs to be split by
> element 1 and element 2 where separator is ":". Sometimes could be three
> elements in A, but I do not need the third element.
>
> Since R does not have a SCAN function as in SAS, C=scan(A,1,":");
> D=scan(A,2,":");
> I am using a combination of strsplit and sapply. If I do not use the
> index [i] then R captures the full vector . Instead I need row by row
> capturing the first and the second element and from them create two new
> variables C and D.
> Right now as is somehow in the loop i C is captured correctly, but D is
> missing because the variables AA does not have it. Any suggestions?
> Thank you in advance, Aldi
>
> A B
> 1:29439275 0.46773514
> 5:85928892 0.81283052
> 10:128341232 0.09332543
> 1:106024283:ID 0.36307805
> 3:62707519 0.42657952
> 2:80464120 0.89125094
>
> x1<-read.table(file='./test.txt',head=T,sep='\t')
> x1$A <- as.character(x1$A)
>
> for(i in 1:length(x1$A)){
>
> x1$AA[i] <- as.numeric(unlist(strsplit(x1$A[i],':')))
>
> x1$C[i] <- sapply(x1$AA[i],function(x)x[1])
> x1$D[i] <- sapply(x1$AA[i],function(x)x[2])
> }
>
> x1
>
>
>
> > x1
> A B AA C D
> 1 1:29439275 0.46773514 1 1 NA
> 2 5:85928892 0.81283052 5 5 NA
> 3 10:128341232 0.09332543 10 10 NA
> 4 1:106024283:ID 0.36307805 1 1 NA
> 5 3:62707519 0.42657952 3 3 NA
> 6 2:80464120 0.89125094 2 2 NA
>
>
