[R] Effeciently sum 3d table

Bert Gunter gunter.berton at gene.com
Mon Apr 16 22:12:39 CEST 2012


For purposes of clarity only...

On Mon, Apr 16, 2012 at 12:40 PM, David A Vavra <davavra at verizon.net> wrote:
> Bert,
>
> My apologies on the name.
>
> I haven't kept any data on loop times. I don't know why lapply seems faster
> but the difference is quite noticeable. It has struck me as odd. I would
> have thought lapply would be slower. It has taken an effort to change my
> thinking to force fit solutions to it but I've gotten used to it. As of now
> I reserve loops to times when there are only a few iterations (as in 10) and
> to solutions that require passing large amounts of information among
> iterations. lapply is particularly handy when constructing lists.
>
> As for vectorizing, see the code below.

No. Despite the name, this is **not** what I mean by vectorization.
What I mean is pushing the loops down to the C level rather than doing
them at the interpreted level, which is where your code below still
leaves you.

-- Bert

 Note that it uses mapply but that
> simply may have made implementation easier. However, if vectorizing gives an
> improvement over looping, the mapply may be the reason.
>
>> f<-function(x,y,z) catn("do something")
>> Vectorize(f,c('x','y'))
> function (x, y, z)
> {
>    args <- lapply(as.list(match.call())[-1L], eval, parent.frame())
>    names <- if (is.null(names(args)))
>        character(length(args))
>    else names(args)
>    dovec <- names %in% vectorize.args
>    do.call("mapply", c(FUN = FUN, args[dovec], MoreArgs =
> list(args[!dovec]),
>        SIMPLIFY = SIMPLIFY, USE.NAMES = USE.NAMES))
> }
> <environment: 0x7fb3442553c8>
>
> DAV
>
>
> -----Original Message-----
> From: Bert Gunter [mailto:gunter.berton at gene.com]
> Sent: Monday, April 16, 2012 3:07 PM
> To: David A Vavra
> Cc: r-help at r-project.org
> Subject: Re: [R] Effeciently sum 3d table
>
> David:
>
> 1. My first name is Bert.
>
> 2. " It never occurred to me that there would be a question."
> Indeed. But in fact you got solutions for two different
> interpretations (Greg's is what you wanted). That is what I meant when
> I said that clarity in asking the question is important.
>
> 3. > I have gotten the impression that a for loop is very inefficient.
> Whenever I
>> change them to lapply calls there is a noticeable improvement in run time
>> for whatever reason.
> I'd like to see your data on this. My experience is that they are
> typically comparable. Chambers in his "Software for Data Analysis"
> book says (pp 213): (with apply type functions rather than explicit
> loops),  " The computation should run faster... However, none of the
> apply mechanisms changes the number of times the supplied functions is
> called, so serious improvements will be limited to iterating simple
> calculations many times."
>
> 4. You can get serious improvements by vectorizing; and you can do
> that here, if I understand correctly, because all your arrays have
> identical dim = d. Here's how:
>
> ## assume your list of arrays is in listoftables
>
> alldat <- do.call(cbind,listoftables) ## this might be the slow part
> ans <- array(.rowSums (allDat), dim = d)
>
> See ?rowSums for explanations and caveats, especially with NA's .
>
> Cheers,
> Bert
>
> On Mon, Apr 16, 2012 at 11:35 AM, David A Vavra <davavra at verizon.net> wrote:
>> Thanks Gunter,
>>
>> I mean what I think is the normal definition of 'sum' as in:
>>   T1 + T2 + T3 + ...
>> It never occurred to me that there would be a question.
>>
>> I have gotten the impression that a for loop is very inefficient. Whenever
> I
>> change them to lapply calls there is a noticeable improvement in run time
>> for whatever reason. The problem with lapply here is that I effectively
> need
>> a global table to hold the final sum. lapply also  wants to return a
> value.
>>
>> You may be correct that in the long run, the loop is the best. There's a
> lot
>> of extraneous memory wastage holding all of the tables in a list as well
> as
>> the return 'values'.
>>
>> As an alternate and given a pre-existing list of tables, I was thinking of
>> creating a temporary environment to hold the final result so it could be
>> passed globally to each lapply execution level but that seems clunky and
>> wasteful as well.
>>
>> Example in partial code:
>>
>> Env <- CreatEnv() # my own function
>> Assign('final',T1-T1,envir=env)
>> L<-listOfTables
>>
>> lapply(L,function(t) {
>>        final <- get('final',envir=env) + t
>>        assign('final',final,envir=env)
>>        NULL
>> })
>>
>> But I was hoping for a more elegant and hopefully more efficient solution.
>> Greg's suggestion for using reduce seems in order but as yet I'm
> unfamiliar
>> with the function.
>>
>> DAV
>>
>>
>>
>> -----Original Message-----
>> From: Bert Gunter [mailto:gunter.berton at gene.com]
>> Sent: Monday, April 16, 2012 12:42 PM
>> To: Greg Snow
>> Cc: David A Vavra; r-help at r-project.org
>> Subject: Re: [R] Effeciently sum 3d table
>>
>> Define "sum" . Do you mean you want to get a single sum for each
>> array? -- get marginal sums for each array? -- get a single array in
>> which each value is the sum of all the individual values at the
>> position?
>>
>> Due thought and consideration for those trying to help by formulating
>> your query carefully and concisely vastly increases the chance of
>> getting a useful answer. See the posting guide -- this is a skill that
>> needs to be learned and the guide is quite helpful. And I must
>> acknowledge that it is a skill that I also have not yet mastered.
>>
>> Concerning your query, I would only note that the two responses from
>> Greg and Petr that you received are unlikely to be significantly
>> faster than just using loops, since both are still essentially looping
>> at the interpreted level. Whether either give you what you want, I do
>> not know.
>>
>> -- Bert
>>
>> On Mon, Apr 16, 2012 at 8:53 AM, Greg Snow <538280 at gmail.com> wrote:
>>> Look at the Reduce function.
>>>
>>> On Mon, Apr 16, 2012 at 8:28 AM, David A Vavra <davavra at verizon.net>
>> wrote:
>>>> I have a large number of 3d tables that I wish to sum
>>>> Is there an efficient way to do this? Or perhaps a function I can call?
>>>>
>>>> I tried using do.call("sum",listoftables) but that returns a single
>> value.
>>>>
>>>> So far, it seems only a loop will do the job.
>>>>
>>>>
>>>> TIA,
>>>> DAV
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>>
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost
>> atistics/pdb-ncb-home.htm
>>
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost
> atistics/pdb-ncb-home.htm
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list