[R] Efficiency challenge: MANY subsets
Johannes Graumann
johannes_graumann at web.de
Fri Jan 16 14:06:36 CET 2009
Hello,
I have a list of character vectors like this:
sequences <- list(
c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I","M",
"N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","F",
"N","I","N","I","N","I","D","K","M","Y","I","H","*")
)
and another list of subset ranges like this:
indexes <- list(
list(
c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
)
)
What I now want to do is to subset each entry in "sequences"
(sequences[[1]]) with all ranges in the corresponding low level list in
"indexes" (indexes[[1]]). Here is what I came up with.
fragments <- list()
for(iN in seq(length(sequences))){
cat(paste(iN,"\n"))
tmpFragments <- sapply(
indexes[[iN]],
function(x){
sequences[[iN]][seq.int(x[1],x[2])]
}
)
fragments[[iN]] <- tmpFragments
}
This works fine, but "sequences" contains thousands of entries and the
corresponding "indexes" are sometimes hundreds of ranges long, so this whole
process is EXTREMELY inefficient.
Does somebody out there take the challenge and show me a way on how to speed
this up?
Thanks for any hints,
Joh
More information about the R-help
mailing list