[R] aov error with large data set

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Jul 16 20:27:26 CEST 2008


Mike Lawrence wrote:
> I'm looking to analyze a large data set: a within-Ss 2*2*1500 design 
> with 20 Ss. However, aov() gives me an error, reproducible as follows:
>
> id = factor(1:20)
> a = factor(1:2)
> b = factor(1:2)
> d = factor(1:1500)
> temp = expand.grid(id=id, a=a, b=b, d=d)
> temp$y = rnorm(length(temp[, 1])) #generate some random DV data
> this_aov = aov(
>     y~a*b*d+Error(id/(a*b*d))
>     , data=temp
> )
>
> While yields the following error:
> "
> Error in model.matrix.default(mt, mf, contrasts) :
>   allocMatrix: too many elements specified
> "
>
> Any suggestions?
>
This is an inherent weakness of aov(), or at least the current 
implementation thereof. You end up fitting a set of linear models with a 
huge number of parameters, in order to get the separation into strata. 
The column dimensions of the design matrices are the number of random 
effects, and if you have 60000 of those, you run out of storage. (As 
written, you even have 120000=20*2*2*1500 for the id*a*b*d term, but 
removing it isn't really going to help.)

(30 years ago, a much more efficient algorithm was implemented in 
Genstat, but we seem to be short of volunteers to reimplement it...)

Ideas? Here are three:

lme4 should be able to handle such designs. It won't get the df for the 
F tests, but you could work them out by hand.

or, you could try recasting as a multivariate lm problem (see my recent 
R News paper). This is still pretty huge, but this time the limiting 
quantity is the 6000*6000 empirical covariance matrix, which could be 
manageable.

or, the most efficient way, but much more work for you: Generate the 
relevant tables of means and residuals; e.g. by placing your date in a 
20*2*2*1500 table and using the relevant combinations of apply() and 
sweep(). These can be used to generate the relevant sums of squares.
> Mike
>
> -- 
> Mike Lawrence
> Graduate Student, Department of Psychology, Dalhousie University
>
> www.memetic.ca
>
> "The road to wisdom? Well, it's plain and simple to express:
> Err and err and err again, but less and less and less."
>     - Piet Hein
>
"Problems worthy of attack, prove their worth by hitting back" - Piet Hein

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907



More information about the R-help mailing list