[R-sig-ME] Independence of residuals in a computer-based experiment/simulation analysed using a LME?

Gavin Simpson gavin.simpson at ucl.ac.uk
Tue Oct 16 13:54:19 CEST 2012

Dear List

I know cross posting to multiple fora is "bad" but I have had little
response to a question I posted on the CrossValidated site yesterday and
I wondered if I might exploit the expertise on this list to solicit an
Answer to my query. CV Question is here (with nicer formatting):

The R-related and Mixed-Model-related bit is that I am doing the all the
analysis in R and am using **lme4** and `lmer()`.

I conducted a computer-based assessment of different methods of fitting
a particular type of model used in the palaeo sciences. I had a
large-ish training set and so I randomly (stratified random sampling)
set aside a test set. I fitted m different methods to the training set
samples and using the m resulting models I predicted the response for
the test set samples and computed a RMSEP over the samples in the test
set. This is a single run.

I then repeated this process a large number of times, each time I chose
a different training set by randomly sampling a new test set.

Having done this I want to investigate if all methods have effectively
the same error performance and whether any of the m methods has better
or worse RMSEP performance via multiple pair-wise comparisons.

My approach has been to fit a linear mixed effects (LME) model, with a
single random effect for Run. I used lmer() from the lme4 package to fit
my model and functions from the multcomp package to perform the multiple
comparisons. My model was essentially

lmer(RMSEP ~ method + (1 | Run), data = FOO)

where method is a factor indicating which method was used to generate
the model predictions for the test set and Run is an indicator for each
particular Run of my "experiment". I used Tukey contrasts or coding from
the multcomp package to do the multiple comparisons.

My question is in regard to the residuals of the LME. Given the single
random effect for Run the model assumes that the RMSEP (response) values
(and hence residuals) for that run are correlated to some degree but are
uncorrelated between runs, on the basis of the induced correlation that
the random effect affords.

Is this assumption of independence between runs valid? If not is there a
way to account for this in the LME model or should I be looking to
employ another type of statical analysis to answer my question?


 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk

More information about the R-sig-mixed-models mailing list