[R] Apply function to every 20 rows between pairs of columns in a matrix
arun
smartpink111 at yahoo.com
Tue Nov 12 04:43:35 CET 2013
HI,
set.seed(25)
dat1 <- as.data.frame(matrix(sample(c("A","T","G","C"),46482*56,replace=TRUE),ncol=56,nrow=46482),stringsAsFactors=FALSE)
lst1 <- split(dat1,as.character(gl(nrow(dat1),20,nrow(dat1))))
res <- lapply(lst1,function(x) sapply(x[,1:8],function(y) sapply(x[,9:56], function(z) sum(y==z)/20)))
length(res)
#[1] 2325 ### check here
dim(res[[1]])
#[1] 48 8
A.K.
Hi all, I have a set of genetic SNP data that looks like
Founder1 Founder2 Founder3 Founder4 Founder5 Founder6 Founder7 Founder8 Sample1 Sample2 Sample3 Sample...
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
A A A T T T T T A T A T
The size of the matrix is 56 columns by 46482 rows. I need to
first bin the matrix by every 20 rows, then compare each of the first 8
columns (founders) to each columns 9-56, and divide the total number of
matching letters/alleles by the total number of rows (20). Ultimately I
need 48 8 column by 2342 row matrices, which are essentially similarity
matrices. I have tried to extract each pair separately by something like
"length(cbind(odd[,9],odd[,1])[cbind(odd[,9],cbind(odd[,9],odd[,1])[,1])[,1]=="T"
& cbind(odd[,9],odd[,1])[,2]=="T",])/nrow(cbind(odd[,9],odd[,1]))"
but this is no where near efficient, and I do not know of a
faster way of applying the function to every 20 rows and across multiple
pairs.
In the example given above, if the rows were all identical like
shown across 20 rows, then the first row of the matrix for Sample1 would
be
1 1 1 0 0 0 0
More information about the R-help
mailing list