[R-sig-ME] advice on model selection and use of BLUPs needed

Kirstin-Friederike Heise kheise at uke.uni-hamburg.de
Mon Nov 14 17:00:22 CET 2011

Dear list, 

I rather stumbled into using mixed models to analyse a set of physiologic data acquired in an experimental setting with transcranial magnetic (TMS) stimulation. Having read about the basics (such as Goldstein, Twisk, and Pinheiro & Bates), I would very much appreciate some advice and comments on my approach so far helping me to correct and hopefully improve my analyses. 

In the following, I am trying to describe my experiment, which is a single group design with four between subjects factors and a within subjects factor and no experimental manipulation.  As far as I understood from reading various threats of this list, this experiment obviously deviates from the common study design here, so I hope my description is precise as well as concise enough for the list members.

1. library and methods used, model selection

In my 1st experiment I was interested in the question, whether a specific physiological parameter acquired with transcranial magnetic stimulation (TMS)  is primarily influenced by AGE(continuous), SEX(female=2, male=1), HEMISPHERE(left=1, right=2) of participants measured. My basic hypothesis is that AGE is the main factor to explain the variance. The sample was chosen with the aim to present a wide range of the adult age range (20-88 years of  age) and factors sex and hemisphere measured were fairly evenly distributed. 

The experiment consisted of a single session in which the primary outcome (SICI = motor evoked potentials elicited with TMS, i.e. the primary outcome is an electromyographic amplitude measured from peak to peak which by convention is then normalized to baseline, i.e. expressed as the percentage of a baseline amplitude). Hence, the data is highly positively skewed and did not conform to the central limits theorem, therefore I chose a logarithmic (natural) transformation (=> lnSICIm).

Now to one of the most important aspects of the study design: The primary outcome (lnSICI) was measured during a simple-reaction time task, in which the measurement stimulus was elicited at six different time zones (TIME) between the Go-signal and movement onset, adapted to the individual's reaction time. Therefore a within subject factor TIMEST (=linear estimation of individual reaction time in zones from -5 to 0 between Go signal and EMG onset, where -5 is close to Go and 0 is close before EMG onset) was introduced. The time zones are supposed to be related to each other but they were defined beforehand on the basis of the (highly jittering) individual reaction times and TMS pulse application was randomized over the order of time zones, therefore time zones do not represent the actual order of repeated measures during the experiment. **Nonetheless, the order of the time zones represents some kind or relation among each other and from previous data I know, that an increase in amplitude from time zone -5 to 0 can be observed, the trajectory of which is neither clearly described by an exponential nor an asymptotic function and which potentially changes completely with increasing age. Therefore, I also added a quadratic term, since this "modulation" does not seem to follow a purely linear regression line.**
Single trials were re-sampled off-line for analysis. I would like to analyze whether the "modulation" of the primary outcome (lnSICI) changes over the time zones and whether this modulations varies with AGE, SEX, or HEMISPHERE.  

Additionally, I wanted to control for the measurement immanent variable stimulus intensity rMT(continuous) which also varied between subjects. Moreover, I wanted to control for the influence of the level of the outcome parameter during resting condition (LNmeanSICIr), measured at the same day in a different session. 

The data matrix comprises of 3936 observations clustered in 64 groups, i.e. subjects, there are on average 10 trials per time zone per subject, but missing data exists. Here an example:

> head(dta)
  subject 	age 	hemisphere sex 	TIME 	SICI  		rMT 	TIMEST 	TIMEST2   LNmeanSICIr 
1 A1_BB         62 		1   			1    	1  		17.45646	37      	-5      	25 			3.325864  
2 A1_BB         62 		1   			1    	1  		27.56828	37      	-5      	25 			3.325864  
3 A1_BB         62 		1   			1    	1  		32.41933	37      	-5      	25 			3.325864  
4 A1_BB         62 		1   			1    	1  		28.31985	37      	-5      	25 			3.325864  
5 A1_BB         62 		1   			1    	1  		NA			37      	-5      	25			3.325864   
6 A1_BB         62 		1   			1    	1  		23.09312	37      	-5      	25 			3.325864  

I started with a model, which included all relevant factors and interactions and selected my final model by backward elimination of interactions and afterwards main factors, using -2LL and BIC criterion for model selection, in particular focussing on the random term. 

Since in my sample, the variables SEX and HEMISPHERE were somehow manipulated, i.e. my subjects were chosen according to these features, I left these factors in the model although they turned out not to be significant. 

Adding TIMEST and TIMEST2 to the random term highly improved the model.

Thus, my final model now would be the following:


mM1<- lme(lnSICIm~factor(sex)+factor(hemisphere)+age+TIMEST+TIMEST2+LNmeanSICIr+age*TIMEST+LNmeanSICIr*TIMEST,data=dta, random=~TIMEST+TIMEST2 | subject,na.action=na.exclude,method="REML")

Does this sound reasonable so far? I wonder whether using lme is really the right choice? Coming from a field where posthoc pairwise comparisons are the usual procedure, I wonder how to deal wether it would be an appropriate approach to compare age groups in a second step, since AGE itself as well as AGE*TIMEST turned out to be highly significant fixed effects. 

2. using BLUPs from (1.) to add as fixed effect into second part of experiment (behavioural data)

The next step for me would be to see if the task-related modulation of my outcome parameter is explaining the variance in behavioural data, collected in another session. (also see above **)

Here, I got the advice to use the BLUPs from my final model (using the random.effects(mM1) command?
I read a lot on the use of BLUPs and got the impression, that some controversy exists regarding the use of BLUPs. I was not able to find out whether this approach applies to my data and question at all. Using the BLUPs, they could only be added as fixed factors into the behavioural model, otherwise I ran into a convergence problem.

Does an alternative approach exist to describe the "modulation" of the outcome variable over the time zones, i.e. obtaining a single variable per subject which characterizes the "modulatory capacity" in order to be used as a variable in the next step of analysis, to see if any relationship between physiology and behaviour exists.

I absolutely thank you very much in advance for your valuable time and thoughts!


Kirstin Heise
PhD candidate
University Medical Center Hamburg-Eppendorf
Martinistr. 52
D-20246 Hamburg, Germany
kheise at uke.uni.de

-------------- next part --------------

Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG):

Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg

Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr. Alexander Kirstein, Joachim Pr?l?, Prof. Dr. Dr. Uwe Koch-Gromus 

More information about the R-sig-mixed-models mailing list