[R] Efficiency challenge: MANY subsets
jim holtman
jholtman at gmail.com
Fri Jan 16 22:15:06 CET 2009
Try this one; it is doing a list of 7000 in under 2 seconds:
> sequences <- list(
+
+
+ c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I"
+ ,"M",
+
+
+ "N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","F",
"N","I","N","I","N","I","D","K","M","Y","I","H","*")
+ )
>
>
>
> indexes <- list(
+ list(
+ c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
+ )
+ )
>
> indexes <- rep(indexes,10)
> sequences <- rep(sequences,7000)
>
> system.time({
+ fragments <- lapply(indexes, function(.seq){
+ lapply(.seq, function(.range){
+ .range <- seq(.range[1], .range[2]) # save since we use several times
+ lapply(sequences, '[', .range)
+ })
+ })
+ })
user system elapsed
1.24 0.00 1.26
>
>
On Fri, Jan 16, 2009 at 3:16 PM, Johannes Graumann
<johannes_graumann at web.de> wrote:
> Thanks. Very elegant, but doesn't solve the problem of the outer "for" loop,
> since I now would rewrite the code like so:
>
> fragments <- list()
> for(iN in seq(length(sequences))){
> cat(paste(iN,"\n"))
> fragments[[iN]] <-
> lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))])
> }
>
> still very slow for length(sequences) ~ 7000.
>
> Joh
>
> On Friday 16 January 2009 14:23:47 Henrique Dallazuanna wrote:
>> Try this:
>>
>> lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))])
>>
>> On Fri, Jan 16, 2009 at 11:06 AM, Johannes Graumann <
>>
>> johannes_graumann at web.de> wrote:
>> > Hello,
>> >
>> > I have a list of character vectors like this:
>> >
>> > sequences <- list(
>> >
>> >
>> > c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I"
>> >,"M",
>> >
>> >
>> > "N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","
>> >F", "N","I","N","I","N","I","D","K","M","Y","I","H","*")
>> > )
>> >
>> > and another list of subset ranges like this:
>> >
>> > indexes <- list(
>> > list(
>> > c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
>> > )
>> > )
>> >
>> > What I now want to do is to subset each entry in "sequences"
>> > (sequences[[1]]) with all ranges in the corresponding low level list in
>> > "indexes" (indexes[[1]]). Here is what I came up with.
>> >
>> > fragments <- list()
>> > for(iN in seq(length(sequences))){
>> > cat(paste(iN,"\n"))
>> > tmpFragments <- sapply(
>> > indexes[[iN]],
>> > function(x){
>> > sequences[[iN]][seq.int(x[1],x[2])]
>> > }
>> > )
>> > fragments[[iN]] <- tmpFragments
>> > }
>> >
>> > This works fine, but "sequences" contains thousands of entries and the
>> > corresponding "indexes" are sometimes hundreds of ranges long, so this
>> > whole
>> > process is EXTREMELY inefficient.
>> >
>> > Does somebody out there take the challenge and show me a way on how to
>> > speed
>> > this up?
>> >
>> > Thanks for any hints,
>> >
>> > Joh
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list