[R] calculating the mean of a random matrix (by row) and some general questions

David Winsemius dwinsemius at comcast.net
Tue Jul 19 23:25:06 CEST 2011


On Jul 19, 2011, at 4:18 PM, Peter Lomas wrote:

> Hi Richard,
>
> As others have said, try to use the "apply" functions rather than  
> loops.
> There is also an apply function for lists, see ?lapply.  This is  
> much more
> efficient.

Actually the "apply" functions are not "more efficient" in the usual  
meaning of time of execution. And sometimes they is rather  
inefficient. Prior discussions of this topic in the archives should be  
easy to find. The economy is in expression and the advantage is in  
code creation and maintenance.

Doubters of this proposition should consider these results:

library(rbenchmark)  # help page has a more compact version of these  
tests

means.rep = function(n, m) {res1 <- vector(length=100, mode="numeric")
               res1 <- replicate(n, mean( rexp(m)))}
means.colMn = function(n, m) {res2 <- vector(length=100, mode="numeric")
                res2 <- colMeans(matrix( rexp(n*m), c(m, n)))}
means.tapply = function(n,m) {res3 <- vector(length=100, mode="numeric")
                  res3 <- tapply( rexp(n*m), rep(1:n, each = m), mean)}
means.apply =function(n,m) { res4 <- vector(length=100, mode="numeric")
                 res4 <-apply( matrix(rexp(m*n),n,m), 1, mean) }
means.forloop =function(n, m) {res5 <- vector(length=100,  
mode="numeric")
                  for (i in n) {res5[i] <-mean(rexp(m))} }
benchmark(
    repl = means.rep(100, 100),
    tappl = means.tapply(100, 100),
    appl = means.apply(100, 100),
    pat = means.pat(100, 100),
    forloop =  means.forloop(100,100),
    replications=100, columns=c("test","replications","elapsed"),
    order='elapsed' )

###
Results:
      test replications relative elapsed
5 forloop          100     1.00   0.004
4     pat          100    20.25   0.081
1    repl          100    77.00   0.308
3    appl          100    89.75   0.359
2   tappl          100   264.50   1.058

I admit that I was rather surprised to see the for-loop beating  
colMeans by such a wide margin,  and this is making me wonder if I  
reversed some index or coded the for-loop test wrong. So would  
appreciate some auditing and improvement of this test.  (But I don't  
see how I could have reversed the order since the n and m are both  
100. And I tried adding assignments to see if there were only promises  
being made with no calculations. The relative efficiencies stays the  
same.)

-- 
David.


>  I also like writing my own functions.  For example:
>
> f <- function(x) {
>   x^2
> }
>
> Which can then be used by:
>> f(2)
> [1] 4
>
> This is very useful if you're getting into maximum likelihood  
> programming,
> or want to use the "optim" function (for multivariate functions) or
> "optimize" (for univariate functions).
>
> Lastly, check out the R reference card.
> http://cran.r-project.org/doc/contrib/Short-refcard.pdf
>
> Regards,
> Peter
>
> On Tue, Jul 19, 2011 at 12:43, RichardLang <lang at zedat.fu-berlin.de>  
> wrote:
>
>> Hi everyone!
>>
>> I'm trying to teach myself R in order to do some data analysis. I'm a
>> mathematics student and (only) familiar with matlab and latex. I'm  
>> working
>> trough the "official" introduction to R at the moment, while  
>> simultaneously
>> solving some exercises I found in the web. Before I post my (probably
>> stupid) question, I'd like to ask you for some general advice. How  
>> do you
>> work with R? Is it like in matlab, that you write your functions  
>> with a lot
>> of loops etc. in a textfile and then run it? Or do you just prepare  
>> your
>> data and then use the functions provided by R (plot, mean etc) to  
>> get some
>> analysis? I'd be very thankfull for some of your thoughts about
>> "approaches".
>>
>> Now the question: I'm trying to build a vector with n entries, each
>> consisting of the mean of m random numbers (exponential distributed  
>> for
>> example). My approach was to construct a nxm random matrix and then  
>> to
>> somehow take the mean of each row. But in the mean function there  
>> is no
>> parameter to do this, so the intended approach of R is probably  
>> different..
>> any ideas? =)
>>
>> Richard
>


David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list