[R] How to make this for() loop memory efficient?

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Jan 11 01:38:56 CET 2012


Yeah -- just fired off an apology email before this landed in my inbox.

Sometimes I'm better off not trying to help at all -- this was one of
those cases ;-)

Whatever I was trying to do clearly was going down the wrong trail

Thankfully, you're on top of it though.

Sorry for the spam,
-steve

On Tue, Jan 10, 2012 at 7:33 PM, Ray Brownrigg
<Ray.Brownrigg at ecs.vuw.ac.nz> wrote:
> Steve:
>
> I don't understand why you couldn't get the original code working.  You just have to
> notice that one comment overflows its line.
>
> However I couldn't get your code to match the output of the original - almost, but not
> quite!
>
> Ray
>
> On Wed, 11 Jan 2012, Steve Lianoglou wrote:
>> I'm having a really difficult time understanding what you're trying to
>> get -- copy and pasting your code is failing to run, and your question
>> isn't clear, ie:
>>
>> "For each phone call that BEGINS with the module which is denoted by 81
>> (i.e. of the form 81X,XXX), what is the expected number of modules in these
>> calls?"
>>
>> How does one calculate the expected number of "modules" in this
>> module? What does that even mean?
>>
>> Anyway, here's some using your `data` data.frame that calculates the
>> number of unique calls and other statistics on the "call id" within
>> each module prefix. I'm using both data.table and plyr ... there are
>> no for loops.
>>
>> You will want to do `whatever it is you really want to do` inside the
>> "blocks" below.
>>
>> ## R code
>> data <- transform(data, module.prefix=substring(modules, 1, 2))
>>
>> ## take a look at `data` now
>>
>> ## calulate "stuff" inside each module.prefix using data.table
>> xx <- data.table(data, key="module.prefix")
>>
>> ans <- xx[, {
>>   ## the columns of the particular subset of your data.table
>>   ## are "injected" into the scope for this expression block
>>   ## which is where the `calls` variable below comes from
>>   tabled <- table(as.character(calls))
>>   list(unique.calls=length(tabled), min=min(tabled),
>> median=as.numeric(median(tabled)), max=max(tabled))
>>   ## you will want to return your own list of "stuff"
>> }, by='module.prefix']
>>
>>
>> ## with plyr
>> library(plyr)
>> ans <- ddply(data, "module.prefix", function(x) {
>>   ## `x` is a data.frame that all share the same module.prefix
>>   ## do whatever you want with it here
>>   tabled <- table(as.character(x$calls))
>>   c(unique.calls=length(tabled), min=min(tabled),
>> median=median(tabled), max=max(tabled))
>> })
>>
>> You'll have to read up on the particulars of data.table and plyr. Both
>> are really powerful packages ... you should get familiar with at least
>> one.
>>
>> plyr is a bit more flexible in some ways.
>>
>> data.table is a bit more strict (cf. the need for
>> `as.numeric(median(tabled))`), but also tends to be (much) faster when
>> working over large datasets
>>
>> HTH,
>> -steve
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list