[R] Effeciently sum 3d table

Bert Gunter gunter.berton at gene.com
Mon Apr 16 22:48:19 CEST 2012


On Mon, Apr 16, 2012 at 1:39 PM, David A Vavra <davavra at verizon.net> wrote:
> OK. I'll take your word for it. The mapply function calls "do_mapply" so I
> would have thought it is passing the operation down to the C code. I haven't
> tracked it any further than below.

No, they can't. Function evaluation must take place at the interpreted
level. However, don't take my word -- take Chambers's.

-- Bert

>
>> mapply
> function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)
> {
>    FUN <- match.fun(FUN)
>    dots <- list(...)
>    answer <- .Call("do_mapply", FUN, dots, MoreArgs, environment(),
>        PACKAGE = "base")
>
> ... etc.
>
>
> -----Original Message-----
> From: Bert Gunter [mailto:gunter.berton at gene.com]
> Sent: Monday, April 16, 2012 4:13 PM
> To: David A Vavra
> Cc: r-help at r-project.org
> Subject: Re: [R] Effeciently sum 3d table
>
> For purposes of clarity only...
>
> On Mon, Apr 16, 2012 at 12:40 PM, David A Vavra <davavra at verizon.net> wrote:
>> Bert,
>>
>> My apologies on the name.
>>
>> I haven't kept any data on loop times. I don't know why lapply seems
> faster
>> but the difference is quite noticeable. It has struck me as odd. I would
>> have thought lapply would be slower. It has taken an effort to change my
>> thinking to force fit solutions to it but I've gotten used to it. As of
> now
>> I reserve loops to times when there are only a few iterations (as in 10)
> and
>> to solutions that require passing large amounts of information among
>> iterations. lapply is particularly handy when constructing lists.
>>
>> As for vectorizing, see the code below.
>
> No. Despite the name, this is **not** what I mean by vectorization.
> What I mean is pushing the loops down to the C level rather than doing
> them at the interpreted level, which is where your code below still
> leaves you.
>
> -- Bert
>
>  Note that it uses mapply but that
>> simply may have made implementation easier. However, if vectorizing gives
> an
>> improvement over looping, the mapply may be the reason.
>>
>>> f<-function(x,y,z) catn("do something")
>>> Vectorize(f,c('x','y'))
>> function (x, y, z)
>> {
>>    args <- lapply(as.list(match.call())[-1L], eval, parent.frame())
>>    names <- if (is.null(names(args)))
>>        character(length(args))
>>    else names(args)
>>    dovec <- names %in% vectorize.args
>>    do.call("mapply", c(FUN = FUN, args[dovec], MoreArgs =
>> list(args[!dovec]),
>>        SIMPLIFY = SIMPLIFY, USE.NAMES = USE.NAMES))
>> }
>> <environment: 0x7fb3442553c8>
>>
>> DAV
>>
>>
>> -----Original Message-----
>> From: Bert Gunter [mailto:gunter.berton at gene.com]
>> Sent: Monday, April 16, 2012 3:07 PM
>> To: David A Vavra
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Effeciently sum 3d table
>>
>> David:
>>
>> 1. My first name is Bert.
>>
>> 2. " It never occurred to me that there would be a question."
>> Indeed. But in fact you got solutions for two different
>> interpretations (Greg's is what you wanted). That is what I meant when
>> I said that clarity in asking the question is important.
>>
>> 3. > I have gotten the impression that a for loop is very inefficient.
>> Whenever I
>>> change them to lapply calls there is a noticeable improvement in run time
>>> for whatever reason.
>> I'd like to see your data on this. My experience is that they are
>> typically comparable. Chambers in his "Software for Data Analysis"
>> book says (pp 213): (with apply type functions rather than explicit
>> loops),  " The computation should run faster... However, none of the
>> apply mechanisms changes the number of times the supplied functions is
>> called, so serious improvements will be limited to iterating simple
>> calculations many times."
>>
>> 4. You can get serious improvements by vectorizing; and you can do
>> that here, if I understand correctly, because all your arrays have
>> identical dim = d. Here's how:
>>
>> ## assume your list of arrays is in listoftables
>>
>> alldat <- do.call(cbind,listoftables) ## this might be the slow part
>> ans <- array(.rowSums (allDat), dim = d)
>>
>> See ?rowSums for explanations and caveats, especially with NA's .
>>
>> Cheers,
>> Bert
>>
>> On Mon, Apr 16, 2012 at 11:35 AM, David A Vavra <davavra at verizon.net>
> wrote:
>>> Thanks Gunter,
>>>
>>> I mean what I think is the normal definition of 'sum' as in:
>>>   T1 + T2 + T3 + ...
>>> It never occurred to me that there would be a question.
>>>
>>> I have gotten the impression that a for loop is very inefficient.
> Whenever
>> I
>>> change them to lapply calls there is a noticeable improvement in run time
>>> for whatever reason. The problem with lapply here is that I effectively
>> need
>>> a global table to hold the final sum. lapply also  wants to return a
>> value.
>>>
>>> You may be correct that in the long run, the loop is the best. There's a
>> lot
>>> of extraneous memory wastage holding all of the tables in a list as well
>> as
>>> the return 'values'.
>>>
>>> As an alternate and given a pre-existing list of tables, I was thinking
> of
>>> creating a temporary environment to hold the final result so it could be
>>> passed globally to each lapply execution level but that seems clunky and
>>> wasteful as well.
>>>
>>> Example in partial code:
>>>
>>> Env <- CreatEnv() # my own function
>>> Assign('final',T1-T1,envir=env)
>>> L<-listOfTables
>>>
>>> lapply(L,function(t) {
>>>        final <- get('final',envir=env) + t
>>>        assign('final',final,envir=env)
>>>        NULL
>>> })
>>>
>>> But I was hoping for a more elegant and hopefully more efficient
> solution.
>>> Greg's suggestion for using reduce seems in order but as yet I'm
>> unfamiliar
>>> with the function.
>>>
>>> DAV
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Bert Gunter [mailto:gunter.berton at gene.com]
>>> Sent: Monday, April 16, 2012 12:42 PM
>>> To: Greg Snow
>>> Cc: David A Vavra; r-help at r-project.org
>>> Subject: Re: [R] Effeciently sum 3d table
>>>
>>> Define "sum" . Do you mean you want to get a single sum for each
>>> array? -- get marginal sums for each array? -- get a single array in
>>> which each value is the sum of all the individual values at the
>>> position?
>>>
>>> Due thought and consideration for those trying to help by formulating
>>> your query carefully and concisely vastly increases the chance of
>>> getting a useful answer. See the posting guide -- this is a skill that
>>> needs to be learned and the guide is quite helpful. And I must
>>> acknowledge that it is a skill that I also have not yet mastered.
>>>
>>> Concerning your query, I would only note that the two responses from
>>> Greg and Petr that you received are unlikely to be significantly
>>> faster than just using loops, since both are still essentially looping
>>> at the interpreted level. Whether either give you what you want, I do
>>> not know.
>>>
>>> -- Bert
>>>
>>> On Mon, Apr 16, 2012 at 8:53 AM, Greg Snow <538280 at gmail.com> wrote:
>>>> Look at the Reduce function.
>>>>
>>>> On Mon, Apr 16, 2012 at 8:28 AM, David A Vavra <davavra at verizon.net>
>>> wrote:
>>>>> I have a large number of 3d tables that I wish to sum
>>>>> Is there an efficient way to do this? Or perhaps a function I can call?
>>>>>
>>>>> I tried using do.call("sum",listoftables) but that returns a single
>>> value.
>>>>>
>>>>> So far, it seems only a loop will do the job.
>>>>>
>>>>>
>>>>> TIA,
>>>>> DAV
>>>
>>>
>>> --
>>>
>>> Bert Gunter
>>> Genentech Nonclinical Biostatistics
>>>
>>> Internal Contact Info:
>>> Phone: 467-7374
>>> Website:
>>>
>>
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost
>>> atistics/pdb-ncb-home.htm
>>>
>>
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>>
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost
>> atistics/pdb-ncb-home.htm
>>
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost
> atistics/pdb-ncb-home.htm
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list