[R] How to make this for() loop memory efficient?

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Jan 11 01:35:44 CET 2012


Let me just reply to myself.

Sorry, it's funny how much I don't get this, but it appears Ray is
following you and provides an answer -- scratch my email, it seems to
be way off

(you should still learn plyr and/or data.table if you haven't yet, tho ;-)

Apologies,
-steve

On Tue, Jan 10, 2012 at 7:18 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> I'm having a really difficult time understanding what you're trying to
> get -- copy and pasting your code is failing to run, and your question
> isn't clear, ie:
>
> "For each phone call that BEGINS with the module which is denoted by 81
> (i.e. of the form 81X,XXX), what is the expected number of modules in these
> calls?"
>
> How does one calculate the expected number of "modules" in this
> module? What does that even mean?
>
> Anyway, here's some using your `data` data.frame that calculates the
> number of unique calls and other statistics on the "call id" within
> each module prefix. I'm using both data.table and plyr ... there are
> no for loops.
>
> You will want to do `whatever it is you really want to do` inside the
> "blocks" below.
>
> ## R code
> data <- transform(data, module.prefix=substring(modules, 1, 2))
>
> ## take a look at `data` now
>
> ## calulate "stuff" inside each module.prefix using data.table
> xx <- data.table(data, key="module.prefix")
>
> ans <- xx[, {
>  ## the columns of the particular subset of your data.table
>  ## are "injected" into the scope for this expression block
>  ## which is where the `calls` variable below comes from
>  tabled <- table(as.character(calls))
>  list(unique.calls=length(tabled), min=min(tabled),
> median=as.numeric(median(tabled)), max=max(tabled))
>  ## you will want to return your own list of "stuff"
> }, by='module.prefix']
>
>
> ## with plyr
> library(plyr)
> ans <- ddply(data, "module.prefix", function(x) {
>  ## `x` is a data.frame that all share the same module.prefix
>  ## do whatever you want with it here
>  tabled <- table(as.character(x$calls))
>  c(unique.calls=length(tabled), min=min(tabled),
> median=median(tabled), max=max(tabled))
> })
>
> You'll have to read up on the particulars of data.table and plyr. Both
> are really powerful packages ... you should get familiar with at least
> one.
>
> plyr is a bit more flexible in some ways.
>
> data.table is a bit more strict (cf. the need for
> `as.numeric(median(tabled))`), but also tends to be (much) faster when
> working over large datasets
>
> HTH,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list