[R] data manipulation

Thu Jan 12 12:23:31 CET 2012

Hi all,

I'm trying to do some data manipulation using R, but I'm a bit stuck. I have
to warn you, I'm a real R noob.

I have for example this file:

V1                              V2	         V3	                 V4	       
V5	                                  V6
1:156706559	          rs8658	 dbSNP_52	C/G/A	C=2996/G=7762/A=0	 
31.8803/20.2782/27.849
1:69116	                  none	 none	        A/G	        A=1/G=611	                 
0.0/0.2747/0.1634
1:69134	                  none	 none	        G/A	        G=8/A=724	                 
1.9108/0.4785/1.0929
1:69270	                  none	 none	        G/A	        G=1896/A=888	         
10.2394/42.6562/31.8966

The format that I want this data in is:

V1	 V2	                 V3	         V4	                V5	  V6	    V7	   V8	         
V9
1	 156706559 	rs8658	dbSNP_52	C	  A	   2996	   0	         27.849
1	 156706559	rs8658	dbSNP_52	G	  A	   7762	   0	         27.849
1	 69116        	none	none	        A	  G	   1	           611	 0.1634
1	 69134	        none	none	        G	  A	   8	           724	 10.929
1	 69270	        none	none	        G	  A	   1896	   888	 318.966

So first separate column V1 by ":". This was done pretty easily.

After that separate column V4 by "/". This was a bit trickier, seeing as
some rows are longer than others, but I managed to do it with this code.
Probably a really lousy way to do it, but it worked. (Don't pay too much
attention to the column numbers, my original file has more columns)

        splittingAllele <- function(y) {

        #####Splitting Column 4 in Variant and Normal Allele
        r <- strsplit(y$V4, "/")

        d <- NULL
        d <- as.list(d)

        for (x in 1:length(r)) {
            d <- rbind(d, r[[x]][length(r[[x]])])
        }

        d <- as.character(unlist(d))	
        d <- as.data.frame(d)

        y[,28] <- d
        y[,28] <- as.character(y[,28])

        f <- as.data.frame(substr(y[,4], 1, nchar(y[,4])-2))
        test3 <- y[,c(1:3)]
        test3[,4] <- f 
        test3[,5:28] <-y[,c(28,5:27)]

        r <- strsplit(as.character(test3[,4]), "/")
        p1 <- cbind(unlist(r), rep(as.character(test3[,1]), sapply(r,
length)))
        p2 <- cbind(unlist(r), rep(as.character(test3[,2]), sapply(r,
length)))
        p3 <- cbind(unlist(r), rep(as.character(test3[,3]), sapply(r,
length)))
        p5 <- cbind(unlist(r), rep(as.character(test3[,5]), sapply(r,
length)))
        p8 <- cbind(unlist(r), rep(as.character(test3[,8]), sapply(r,
length)))
        p9 <- cbind(unlist(r), rep(as.character(test3[,9]), sapply(r,
length)))

        test4 <- cbind(p1[,2], p2[,2], p3[,2], p3[,1], p5[,2], p8[,2],
p9[,2])
        test4 <- as.data.frame(test4)

        test5 <- test4[!duplicated(test4),]

        return(test5)
        }

Now I want to separate column V5, but I'm stuck here. I think I can allmost
use the exact same code as before, but I can't figure it out. 
Any help please??

Thank you in advance!

--
View this message in context: http://r.789695.n4.nabble.com/data-manipulation-tp4288663p4288663.html
Sent from the R help mailing list archive at Nabble.com.