[R-sig-ME] Type of residuals for assessing assumptions in extended lme's

Thu Oct 2 08:09:47 CEST 2008

Dear all,

I am currently evaluating different models regarding haemodynamic reactions in the brain. Thus the realtive concentration of O2Hb and HHb are my response variables. These concentrations are recorded over relatively short stimulation periods (75 sec) and are presently available at a rate of 1 Hz (one measurement per second). One stimulation period is our main observational unit which is then nested in treatment, individual probands etc. I am trying to model the time course within the stimulation period using natural splines.

Not surprisingly, there is strong serial correlation in the raw / pearson residuals. Using "correlation= corARMA (..., p= 2)" gets rid of those. It may be the case that there is som heteroscedasticity as well so an additional "weighst= varIdent (...)" may become nesseary (more of that below).

Main question:

My main issue now is which type of residuals is needed to check assumptions of a model that includes some correlation structure (and may or may not include some additional variance structure). I have, of course, consulted Pinheiro & Bates (chapt 5). It seems obvious that normalised residuals are needed for checking whether the correlation structure is successful in catching the serial correlations (and there are examples of those in said book). I am not quite sure but I also assume based on the gls examples towards the end of chapt 5 that the normalised residuals need to be plotted for other plots used in assessing assumptions such as normal-plots, Tukey-Anscombe-plots and so on (and this seems nothing but consistent). Is that correct in your view?

Secondary question:

What actually happens in our data set is that the normalised residuals do not show any serial correlation any longer but are far from a normal distribution. Whereas the raw / pearson residuals follow a normal distribution very closely and seem to be homoscedastic, the normalised residuals are heteroscedastic regarding some explanatory variables and have much longer lower und upper tails than a normal distribution. Thus, if my notion stated above is correct (namely, that the normalised residuals should generally be used in assessing model assumptions if a correlation structure is present), our models may not be valid if based on a normal distribution.

I would see two basic solutions: (1) either use a model that allows for a distribution with longer tails than a normal or (2) thin the measurement series. lmer might accomodate another distribution family but - if I have followed the discussions on this list correclty - does not allow for a serial correlation structure. Thus, this approach might need a lot of new programming and developping. The second approach would presume that not all the data are actually necessary to model the change of concentrations and that fewer observations are sufficient with the advantage that they are serially uncorrelated. Any oppinions on these two strategies?

Many thanks for your time and ideas, Lorenz
- 
Lorenz Gygax, Dr. sc. nat.
Federal Veterinary Office
Centre for proper housing of ruminants and pigs
Tänikon, CH-8356 Ettenhausen / Switzerland