[R] how to split row elements [1] and [2] of a string variable A via srtsplit and sapply
Aldi
aldi at dsgmail.wustl.edu
Fri Sep 11 17:11:43 CEST 2015
Thank you Jim and Bert for your suggestions.
Following is the final version used:
### Original tiny test data from Aldi Kraja, 9.11.2015.
### Purpose: split A into element 1 and 2, not interested on 3d element
of A. Assign element one and two to vectors C and D of the same data.frame.
### Do similar work that SAS SCAN function could have done:
C=SCAN(x,1":") ; D=SCAN(x,2,":") ;
### Jim Holtman suggested
### temp <- strsplit(x$A, ":")
### x$C <- sapply(temp, '[[', 1)
### x$D <- sapply(temp, '[[', 2)
### Bert Gunter suggested:
### do.call(rbind,strsplit(x[[1]],":"))[,-3]
### Start of script: a full R solution:
x <- read.table(text = "A B
1:29439275 0.46773514
5:85928892 0.81283052
10:128341232 0.09332543
1:106024283:ID 0.36307805
3:62707519 0.42657952
2:80464120 0.89125094", header = TRUE, as.is = TRUE)
x$A <- as.character(x$A)
temp <- strsplit(x$A,":")
x$C <- sapply(temp,'[[',1)
x$D <- sapply(temp,'[[',2)
x$C <- as.numeric(x$C)
x$D <- as.numeric(x$D)
### Final results:
x
### end of the script
# A B C D
#1 1:29439275 0.46773514 1 29439275
#2 5:85928892 0.81283052 5 85928892
#3 10:128341232 0.09332543 10 128341232
#4 1:106024283:ID 0.36307805 1 106024283
#5 3:62707519 0.42657952 3 62707519
#6 2:80464120 0.89125094 2 80464120
With best wishes,
Aldi
On 9/10/2015 1:35 PM, Bert Gunter wrote:
> ...
> Alternatively, you can avoid the looping (i.e. sapply) altogether by:
>
> do.call(rbind,strsplit(x[[1]],":"))[,-3]
>
>
> [,1] [,2]
> [1,] "1" "29439275"
> [2,] "5" "85928892"
> [3,] "10" "128341232"
> [4,] "1" "106024283"
> [5,] "3" "62707519"
> [6,] "2" "80464120"
>
> These can then be added to the existing frame, converted to numeric, etc.
>
> Cheers,
> Bert
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
> -- Clifford Stoll
>
>
> On Thu, Sep 10, 2015 at 11:05 AM, jim holtman <jholtman at gmail.com> wrote:
>> try this:
>>
>>
>>> x <- read.table(text = "A B
>> + 1:29439275 0.46773514
>> + 5:85928892 0.81283052
>> + 10:128341232 0.09332543
>> + 1:106024283:ID 0.36307805
>> + 3:62707519 0.42657952
>> + 2:80464120 0.89125094", header = TRUE, as.is = TRUE)
>>> temp <- strsplit(x$A, ":")
>>> x$C <- sapply(temp, '[[', 1)
>>> x$D <- sapply(temp, '[[', 2)
>>>
>>> x
>> A B C D
>> 1 1:29439275 0.46773514 1 29439275
>> 2 5:85928892 0.81283052 5 85928892
>> 3 10:128341232 0.09332543 10 128341232
>> 4 1:106024283:ID 0.36307805 1 106024283
>> 5 3:62707519 0.42657952 3 62707519
>> 6 2:80464120 0.89125094 2 80464120
>>
>>
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Thu, Sep 10, 2015 at 1:46 PM, aldi <aldi at wustl.edu> wrote:
>>
>>> Hi,
>>> I have a data.frame x1, of which a variable A needs to be split by
>>> element 1 and element 2 where separator is ":". Sometimes could be three
>>> elements in A, but I do not need the third element.
>>>
>>> Since R does not have a SCAN function as in SAS, C=scan(A,1,":");
>>> D=scan(A,2,":");
>>> I am using a combination of strsplit and sapply. If I do not use the
>>> index [i] then R captures the full vector . Instead I need row by row
>>> capturing the first and the second element and from them create two new
>>> variables C and D.
>>> Right now as is somehow in the loop i C is captured correctly, but D is
>>> missing because the variables AA does not have it. Any suggestions?
>>> Thank you in advance, Aldi
>>>
>>> A B
>>> 1:29439275 0.46773514
>>> 5:85928892 0.81283052
>>> 10:128341232 0.09332543
>>> 1:106024283:ID 0.36307805
>>> 3:62707519 0.42657952
>>> 2:80464120 0.89125094
>>>
>>> x1<-read.table(file='./test.txt',head=T,sep='\t')
>>> x1$A <- as.character(x1$A)
>>>
>>> for(i in 1:length(x1$A)){
>>>
>>> x1$AA[i] <- as.numeric(unlist(strsplit(x1$A[i],':')))
>>>
>>> x1$C[i] <- sapply(x1$AA[i],function(x)x[1])
>>> x1$D[i] <- sapply(x1$AA[i],function(x)x[2])
>>> }
>>>
>>> x1
>>>
>>>
>>>
>>> > x1
>>> A B AA C D
>>> 1 1:29439275 0.46773514 1 1 NA
>>> 2 5:85928892 0.81283052 5 5 NA
>>> 3 10:128341232 0.09332543 10 10 NA
>>> 4 1:106024283:ID 0.36307805 1 1 NA
>>> 5 3:62707519 0.42657952 3 3 NA
>>> 6 2:80464120 0.89125094 2 2 NA
>>>
>>>
>>> --
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
--
[[alternative HTML version deleted]]
More information about the R-help
mailing list