[R] splitting column into two
arun
smartpink111 at yahoo.com
Mon Mar 11 07:15:49 CET 2013
HI,
Try this:
dat1<- read.table(text="
V1,V2,V3,V4,V5,V6,V7
chr1,564563,564598,564588 564589,1336,+,134
chr1,564620,564649,564644 564645,94,+,10
chr1,565369,565404,565371 565372,217,+,8
chr1,565463,565541,565480 565481,1214,+,15
chr1,565653,565697,565662 565663,1031,+,28
chr1,565861,565922,565883 565884,316,+,12
",sep=",",header=TRUE,stringsAsFactors=FALSE)
library(reshape2)
dat2<-with(dat1,{cbind(dat1[,-4],colsplit(V4,pattern=" ",names=c("peak_start","peak_end")))})
dat2
# V1 V2 V3 V5 V6 V7 peak_start peak_end
#1 chr1 564563 564598 1336 + 134 564588 564589
#2 chr1 564620 564649 94 + 10 564644 564645
#3 chr1 565369 565404 217 + 8 565371 565372
#4 chr1 565463 565541 1214 + 15 565480 565481
#5 chr1 565653 565697 1031 + 28 565662 565663
#6 chr1 565861 565922 316 + 12 565883 565884
library(data.table)
datNew<- data.table(dat2)
A.K.
----- Original Message -----
From: "deconstructed.morning at gmail.com" <deconstructed.morning at gmail.com>
To: smartpink111 at yahoo.com
Cc:
Sent: Sunday, March 10, 2013 5:48 PM
Subject: Re: splitting column into two
Hello,
I saw your solution for this question and I want to ask you should I do when I have a very large file, that looks like this:
> clusters<-data.table(CTSS[, grep("V1$|V2$|V3$|V4$|V5$|V6$|V7$", names(CTSS))])
> head(clusters)
V1 V2 V3 V4 V5 V6 V7
1: chr1 564563 564598 564588 564589 1336 + 134
2: chr1 564620 564649 564644 564645 94 + 10
3: chr1 565369 565404 565371 565372 217 + 8
4: chr1 565463 565541 565480 565481 1214 + 15
5: chr1 565653 565697 565662 565663 1031 + 28
6: chr1 565861 565922 565883 565884 316 + 12
What I want is to replace column V4 which contain two numbers separated by a space, with two columns that are numerical. I have tried this:
new <- cbind(CTSS,colsplit(CTSS$V4, ' ', c('peak_start', 'peak_end')) )
but instead of replacing the column it keeps it the same and adds two new columns at end of the columns(after 625 columns). Please let me know if you have a better solution.
Thank you,
Nanami
<quote author='arun kirshna'>
Hi,
May be this helps:
dat1<-read.table(text="
0111 0214 0203 0404 1112 0513 0709 1010 0915 0813
0112 0314 0204 0504 1132 0543 0789 1020 0965 0823
",sep="",header=FALSE,colClasses=rep("character",10))
res<-do.call(data.frame,lapply(dat1,function(x)
do.call(rbind,lapply(strsplit(x,""),function(y)
c(paste0(y[1],y[2]),paste0(y[3],y[4]))))))
colnames(res)<-paste0("V",1:20)
res
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
#1 01 11 02 14 02 03 04 04 11 12 05 13 07 09 10 10 09 15 08 13
#2 01 12 03 14 02 04 05 04 11 32 05 43 07 89 10 20 09 65 08 23
A.K.
</quote>
Quoted from:
http://r.789695.n4.nabble.com/splitting-column-into-two-tp4656108p4656111.html
More information about the R-help
mailing list