[R] run function on subsets of matrix

Sun Mar 27 09:22:00 CEST 2011

On Mar 27, 2011, at 08:25 , David Winsemius wrote:

> 
> On Mar 26, 2011, at 10:26 PM, fisken wrote:
> 
>> I was wondering if it is possible to do the following in a smarter way.
>> 
>> I want get the mean value across the columns of a matrix, but I want

_along_ the columns, I assume. 

>> to do this on subrows of the matrix, given by some vector(same length
>> as the the number of rows). Something like
>> 
>> nObs<- 6
>> nDim <- 4
>> m  <-   matrix(rnorm(nObs*nDim),ncol=nDim)
>> fac<-sample(1:(nObs/2),nObs,rep=T)
>> 
>> ##loop trough different 'factor' levels
>> for (i in unique(fac))
>>   print(apply(m[fac==i,],2,mean))
> 
> This would be a lot simpler and faster:
> 
> colMeans(m[unique(fac),])
> 
> #[1]  1.3595197 -0.1374411  0.1062527 -0.3897732
> 

Say what??? (I suspect David needs to get his sleep - or coffee, if he is in Europe.)

How about:

> aggregate(m,list(fac),mean)
  Group.1          V1         V2         V3           V4
1       1 -0.03785420 -0.2573805 -0.3025759  0.006999996
2       2 -1.39961300  0.2296900 -0.1122359 -0.302734531
3       3  0.50886649  0.6546153 -0.4270368 -0.411807709
> by(m,list(fac),colMeans)
: 1
          V1           V2           V3           V4 
-0.037854195 -0.257380542 -0.302575901  0.006999996 
------------------------------------------------------------- 
: 2
        V1         V2         V3         V4 
-1.3996130  0.2296900 -0.1122359 -0.3027345 
------------------------------------------------------------- 
: 3
        V1         V2         V3         V4 
 0.5088665  0.6546153 -0.4270368 -0.4118077 
> 

(whereas 
> fac
[1] 3 1 1 2 3 1
> colMeans(m[unique(fac),])
[1]  0.39949029 -0.10989080 -0.96655778  0.01262903
> colMeans(m[1:3,])
[1]  0.39949029 -0.10989080 -0.96655778  0.01262903
)

>> 
>> Now, the problem is that if a value in 'fac' only occurs once, the
>> 'apply' function will complain.
> 
> Because "[" will drop single dimensions and so the matrix becomes a vector and looses the number-2 margin. Use drop=FALSE to prevent this, and note the extra comma:
> 
> print(apply(m[1, , drop=FALSE],2,mean))

Yep. 

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com