[R] data manipulation
Rinde
rinde1285 at hotmail.com
Thu Jan 12 12:23:31 CET 2012
Hi all,
I'm trying to do some data manipulation using R, but I'm a bit stuck. I have
to warn you, I'm a real R noob.
I have for example this file:
V1 V2 V3 V4
V5 V6
1:156706559 rs8658 dbSNP_52 C/G/A C=2996/G=7762/A=0
31.8803/20.2782/27.849
1:69116 none none A/G A=1/G=611
0.0/0.2747/0.1634
1:69134 none none G/A G=8/A=724
1.9108/0.4785/1.0929
1:69270 none none G/A G=1896/A=888
10.2394/42.6562/31.8966
The format that I want this data in is:
V1 V2 V3 V4 V5 V6 V7 V8
V9
1 156706559 rs8658 dbSNP_52 C A 2996 0 27.849
1 156706559 rs8658 dbSNP_52 G A 7762 0 27.849
1 69116 none none A G 1 611 0.1634
1 69134 none none G A 8 724 10.929
1 69270 none none G A 1896 888 318.966
So first separate column V1 by ":". This was done pretty easily.
After that separate column V4 by "/". This was a bit trickier, seeing as
some rows are longer than others, but I managed to do it with this code.
Probably a really lousy way to do it, but it worked. (Don't pay too much
attention to the column numbers, my original file has more columns)
splittingAllele <- function(y) {
#####Splitting Column 4 in Variant and Normal Allele
r <- strsplit(y$V4, "/")
d <- NULL
d <- as.list(d)
for (x in 1:length(r)) {
d <- rbind(d, r[[x]][length(r[[x]])])
}
d <- as.character(unlist(d))
d <- as.data.frame(d)
y[,28] <- d
y[,28] <- as.character(y[,28])
f <- as.data.frame(substr(y[,4], 1, nchar(y[,4])-2))
test3 <- y[,c(1:3)]
test3[,4] <- f
test3[,5:28] <-y[,c(28,5:27)]
r <- strsplit(as.character(test3[,4]), "/")
p1 <- cbind(unlist(r), rep(as.character(test3[,1]), sapply(r,
length)))
p2 <- cbind(unlist(r), rep(as.character(test3[,2]), sapply(r,
length)))
p3 <- cbind(unlist(r), rep(as.character(test3[,3]), sapply(r,
length)))
p5 <- cbind(unlist(r), rep(as.character(test3[,5]), sapply(r,
length)))
p8 <- cbind(unlist(r), rep(as.character(test3[,8]), sapply(r,
length)))
p9 <- cbind(unlist(r), rep(as.character(test3[,9]), sapply(r,
length)))
test4 <- cbind(p1[,2], p2[,2], p3[,2], p3[,1], p5[,2], p8[,2],
p9[,2])
test4 <- as.data.frame(test4)
test5 <- test4[!duplicated(test4),]
return(test5)
}
Now I want to separate column V5, but I'm stuck here. I think I can allmost
use the exact same code as before, but I can't figure it out.
Any help please??
Thank you in advance!
--
View this message in context: http://r.789695.n4.nabble.com/data-manipulation-tp4288663p4288663.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list