[R-sig-ME] Large data set with several nesting levels and autocorrelated errors

lorenz.gygax at art.admin.ch lorenz.gygax at art.admin.ch
Tue Jul 24 16:06:24 CEST 2012

Dear Mixed-Modelers,

we are measuring brain activation in animals by non-invasively tracking brain hemodynamics using near-infrared light. We have measured 24 subjects (half of which are kept in either of two housing conditions) in three experimental conditions each and present the according stimuli 12 times per condition (12 blocks). Each presentation lasts 75 seconds and we get a measurement at 1 Hz for eight measurement channels representing different locations on the head.

This results in 24 * 3 * 12 * 75 * 8 = 518'400 rows in a complete data set (some few stimulations are actually missing due to measurement artifacts resulting in a bit less than 500'000 rows in the actual data set). So far, we have usually calculated block averages across the 12 repetitions per stimulus (using the median at each time point across the 12 repetitions). In the current experiment, we have the impression that this averaging smoothes too much of the patterns that are visible in the raw data and thus, we would like to work with the complete non-averaged data set. We are specifically interested in the shape of the time-course of the signal.

A full model as we have been using would look something like this:

lme (hemo ~ expCond * housingCond * ns (time, df= 7) * locHead,
        random= ~ 1 | subj/expCond/block/locHead,
        weights= corARMA (form= ~ 1 | subj/expCond/block/locHead, p= 3))

>From what the data shows and AR(1) process would usually suffice, but somehow the ARMA model with three parameters seems to lead to more (numerically) stable estimates.

This model unfortunately fails on a 64bit Windows 7 Enterprise machine with 24 GB of RAM due to memory limits (sessionInfo is attached at the end). I have only now noticed that this is not really the most recent version of R (this is a machine at our office and could and should be updated, of course). Would that make a (the?) difference in respect to memory management?

I presume that some other functions like lmer would be more economical on memory use but looking through the archives and the internet sites in respect to R and mixed models, there does not seem to be an alternative function that is more economical and can incorporate temporal correlation in the residuals. Is this correct?

Does anyone of you happen to know whether it is possible to make virtual memory on a Windows machine visible to R? (We have set-up some virtual memory on that machine, but R only recognizes the 24 GB of actual RAM.)

Is it possible to estimate how much memory would be needed?

Any further thoughts on and thoughts on alternative approaches?

Many thanks and best wishes, Lorenz
Lorenz Gygax
Centre for proper housing of ruminants and pigs
Tänikon, CH-8356 Ettenhausen / Switzerland

R version 2.12.2 (2011-02-25)
Platform: x86_64-pc-mingw32/x64 (64-bit)

[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252    LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C                       
[5] LC_TIME=German_Switzerland.1252    

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] nlme_3.1-98

loaded via a namespace (and not attached):
[1] grid_2.12.2     lattice_0.19-17 tools_2.12.2   

More information about the R-sig-mixed-models mailing list