[R] calculating the mean of a random matrix (by row) and some general questions
David Winsemius
dwinsemius at comcast.net
Tue Jul 19 23:25:06 CEST 2011
On Jul 19, 2011, at 4:18 PM, Peter Lomas wrote:
> Hi Richard,
>
> As others have said, try to use the "apply" functions rather than
> loops.
> There is also an apply function for lists, see ?lapply. This is
> much more
> efficient.
Actually the "apply" functions are not "more efficient" in the usual
meaning of time of execution. And sometimes they is rather
inefficient. Prior discussions of this topic in the archives should be
easy to find. The economy is in expression and the advantage is in
code creation and maintenance.
Doubters of this proposition should consider these results:
library(rbenchmark) # help page has a more compact version of these
tests
means.rep = function(n, m) {res1 <- vector(length=100, mode="numeric")
res1 <- replicate(n, mean( rexp(m)))}
means.colMn = function(n, m) {res2 <- vector(length=100, mode="numeric")
res2 <- colMeans(matrix( rexp(n*m), c(m, n)))}
means.tapply = function(n,m) {res3 <- vector(length=100, mode="numeric")
res3 <- tapply( rexp(n*m), rep(1:n, each = m), mean)}
means.apply =function(n,m) { res4 <- vector(length=100, mode="numeric")
res4 <-apply( matrix(rexp(m*n),n,m), 1, mean) }
means.forloop =function(n, m) {res5 <- vector(length=100,
mode="numeric")
for (i in n) {res5[i] <-mean(rexp(m))} }
benchmark(
repl = means.rep(100, 100),
tappl = means.tapply(100, 100),
appl = means.apply(100, 100),
pat = means.pat(100, 100),
forloop = means.forloop(100,100),
replications=100, columns=c("test","replications","elapsed"),
order='elapsed' )
###
Results:
test replications relative elapsed
5 forloop 100 1.00 0.004
4 pat 100 20.25 0.081
1 repl 100 77.00 0.308
3 appl 100 89.75 0.359
2 tappl 100 264.50 1.058
I admit that I was rather surprised to see the for-loop beating
colMeans by such a wide margin, and this is making me wonder if I
reversed some index or coded the for-loop test wrong. So would
appreciate some auditing and improvement of this test. (But I don't
see how I could have reversed the order since the n and m are both
100. And I tried adding assignments to see if there were only promises
being made with no calculations. The relative efficiencies stays the
same.)
--
David.
> I also like writing my own functions. For example:
>
> f <- function(x) {
> x^2
> }
>
> Which can then be used by:
>> f(2)
> [1] 4
>
> This is very useful if you're getting into maximum likelihood
> programming,
> or want to use the "optim" function (for multivariate functions) or
> "optimize" (for univariate functions).
>
> Lastly, check out the R reference card.
> http://cran.r-project.org/doc/contrib/Short-refcard.pdf
>
> Regards,
> Peter
>
> On Tue, Jul 19, 2011 at 12:43, RichardLang <lang at zedat.fu-berlin.de>
> wrote:
>
>> Hi everyone!
>>
>> I'm trying to teach myself R in order to do some data analysis. I'm a
>> mathematics student and (only) familiar with matlab and latex. I'm
>> working
>> trough the "official" introduction to R at the moment, while
>> simultaneously
>> solving some exercises I found in the web. Before I post my (probably
>> stupid) question, I'd like to ask you for some general advice. How
>> do you
>> work with R? Is it like in matlab, that you write your functions
>> with a lot
>> of loops etc. in a textfile and then run it? Or do you just prepare
>> your
>> data and then use the functions provided by R (plot, mean etc) to
>> get some
>> analysis? I'd be very thankfull for some of your thoughts about
>> "approaches".
>>
>> Now the question: I'm trying to build a vector with n entries, each
>> consisting of the mean of m random numbers (exponential distributed
>> for
>> example). My approach was to construct a nxm random matrix and then
>> to
>> somehow take the mean of each row. But in the mean function there
>> is no
>> parameter to do this, so the intended approach of R is probably
>> different..
>> any ideas? =)
>>
>> Richard
>
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list