[R] sample and rearrange
David Winsemius
dwinsemius at comcast.net
Thu May 20 00:24:21 CEST 2010
On May 19, 2010, at 5:01 PM, Wu Gong wrote:
>
> It took me a day to make the sense of Jim's code :(
>
> Hope my comments will help.
>
> ## Transform data to matrix
> x <- as.matrix(x)
>
> ## Apply function to each row
> ## Create a function to rearrange bases
> result <- apply(x, 1, function(eachrow){
>
> ## Split each gene to bases
> ## Exclude the fist column which is id
> bases <- strsplit(eachrow[-1], '')
>
> ## Transform list to matrix
> ## Because the result of function strsplit is a list
> bases <- do.call(rbind,bases)
>
> ## Recombine bases by connecting all bases in each column
> recombine <- apply(bases, 2, paste, collapse="")
>
> ## Add id
> ## Transpos recombine
> cbind(eachrow[1], t(recombine))
> })
>
> ## Transpose the result matrix
> result <- t(result)
It will come more quickly as you learn more. I also looked at Jimm's
solution by pulling it apart, although I did not spend a whole day at
it, maybe ten minutes. I thought a three line version was more
informative, because it did not make everything scroll of the console:
> x <- read.table(textConnection("SampleID A1 A2
A3 A4
+ GM920222 GATTGCC GATTGCC GATAGAC GATAGAC
+ GM930040 GTCATCA GAGTGCA ACTATAA GATTGCC
+ GM930040 GTCATCA GAGTGCA ACTATAA GATTGCC"), header=TRUE,
as.is=TRUE)
> x <- as.matrix(x)
> t(apply(x, 1, function(.row){
+ # separate characters
+ z <- do.call(rbind, strsplit(.row[-1], ''))
+ # combine each column
+ z.col <- t(apply(z, 2, paste, collapse=''))
+ # add the ID
+ cbind(.row[1], z.col)
+ }))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "GM920222" "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"
[2,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
[3,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
# I usually see if I can get the inner-most function to work:
> z <- do.call(rbind, strsplit(x[1,], ''))
Warning message:
In function (..., deparse.level = 1) :
number of columns of result is not a multiple of vector length (arg
2)
> z
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
SampleID "G" "M" "9" "2" "0" "2" "2" "2"
#So I guess I didn't get an exact replica since Jim had excluded the
first element in the row
A1 "G" "A" "T" "T" "G" "C" "C" "G"
A2 "G" "A" "T" "T" "G" "C" "C" "G"
A3 "G" "A" "T" "A" "G" "A" "C" "G"
A4 "G" "A" "T" "A" "G" "A" "C" "G"
> z <- do.call(rbind, strsplit(x[1,-1], '')) # there ... cleaner
> z
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
A1 "G" "A" "T" "T" "G" "C" "C"
A2 "G" "A" "T" "T" "G" "C" "C"
A3 "G" "A" "T" "A" "G" "A" "C"
A4 "G" "A" "T" "A" "G" "A" "C"
That seemed to help understand what was going on in the middle of the
functions. Now I wondered if the transpose could be avoided. So I
tried cbind instead of rbind:
> z <- do.call(cbind, strsplit(x[1,-1], ''))
> z
A1 A2 A3 A4
[1,] "G" "G" "G" "G"
[2,] "A" "A" "A" "A"
[3,] "T" "T" "T" "T"
[4,] "T" "T" "A" "A"
[5,] "G" "G" "G" "G"
[6,] "C" "C" "A" "A"
[7,] "C" "C" "C" "C"
> z.col <- apply(z, 2, paste, collapse='')
> z.col
A1 A2 A3 A4
"GATTGCC" "GATTGCC" "GATAGAC" "GATAGAC"
## Nope that does not work:
## So try apply on the columns ...
> z.col <- apply(z, 1, paste, collapse='')
> z.col
[1] "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"
## OK that worked. Now see if it works inside the whole sequence:
> x <- as.matrix(x)
> t(apply(x, 1, function(.row){
+ # separate characters
+ z <- do.call(cbind, strsplit(.row[-1], ''))
+ # combine each column
+ z.col <- apply(z, 1, paste, collapse='')
+ # add the ID
+ cbind(.row[1], z.col)
+ }))
[,1] [,2] [,3] [,4] [,5] [,
6] [,7]
[1,] "GM920222" "GM920222" "GM920222" "GM920222" "GM920222" "GM920222"
"GM920222"
[2,] "GM930040" "GM930040" "GM930040" "GM930040" "GM930040" "GM930040"
"GM930040"
[3,] "GM930040" "GM930040" "GM930040" "GM930040" "GM930040" "GM930040"
"GM930040"
Well not exactly.
[,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"
[2,] "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
[3,] "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
> x <- as.matrix(x)
> t(apply(x, 1, function(.row){
+ # separate characters
+ z <- do.call(cbind, strsplit(.row[-1], ''))
+ # combine each column
+ z.col <- apply(z, 1, paste, collapse='')
+ # add the ID
## and add the transpose columns:
+ cbind(.row[1], t(z.col))
+ }))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "GM920222" "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"
[2,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
[3,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
So I got to the same place but didn't really achieve any savings.
>
> -----
> A R learner.
David "also still learning" Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list