[R-sig-ME] advice on model selection and use of BLUPs needed
kheise at uke.uni-hamburg.de
Mon Nov 14 17:00:22 CET 2011
I rather stumbled into using mixed models to analyse a set of physiologic data acquired in an experimental setting with transcranial magnetic (TMS) stimulation. Having read about the basics (such as Goldstein, Twisk, and Pinheiro & Bates), I would very much appreciate some advice and comments on my approach so far helping me to correct and hopefully improve my analyses.
In the following, I am trying to describe my experiment, which is a single group design with four between subjects factors and a within subjects factor and no experimental manipulation. As far as I understood from reading various threats of this list, this experiment obviously deviates from the common study design here, so I hope my description is precise as well as concise enough for the list members.
1. library and methods used, model selection
In my 1st experiment I was interested in the question, whether a specific physiological parameter acquired with transcranial magnetic stimulation (TMS) is primarily influenced by AGE(continuous), SEX(female=2, male=1), HEMISPHERE(left=1, right=2) of participants measured. My basic hypothesis is that AGE is the main factor to explain the variance. The sample was chosen with the aim to present a wide range of the adult age range (20-88 years of age) and factors sex and hemisphere measured were fairly evenly distributed.
The experiment consisted of a single session in which the primary outcome (SICI = motor evoked potentials elicited with TMS, i.e. the primary outcome is an electromyographic amplitude measured from peak to peak which by convention is then normalized to baseline, i.e. expressed as the percentage of a baseline amplitude). Hence, the data is highly positively skewed and did not conform to the central limits theorem, therefore I chose a logarithmic (natural) transformation (=> lnSICIm).
Now to one of the most important aspects of the study design: The primary outcome (lnSICI) was measured during a simple-reaction time task, in which the measurement stimulus was elicited at six different time zones (TIME) between the Go-signal and movement onset, adapted to the individual's reaction time. Therefore a within subject factor TIMEST (=linear estimation of individual reaction time in zones from -5 to 0 between Go signal and EMG onset, where -5 is close to Go and 0 is close before EMG onset) was introduced. The time zones are supposed to be related to each other but they were defined beforehand on the basis of the (highly jittering) individual reaction times and TMS pulse application was randomized over the order of time zones, therefore time zones do not represent the actual order of repeated measures during the experiment. **Nonetheless, the order of the time zones represents some kind or relation among each other and from previous data I know, that an increase in amplitude from time zone -5 to 0 can be observed, the trajectory of which is neither clearly described by an exponential nor an asymptotic function and which potentially changes completely with increasing age. Therefore, I also added a quadratic term, since this "modulation" does not seem to follow a purely linear regression line.**
Single trials were re-sampled off-line for analysis. I would like to analyze whether the "modulation" of the primary outcome (lnSICI) changes over the time zones and whether this modulations varies with AGE, SEX, or HEMISPHERE.
Additionally, I wanted to control for the measurement immanent variable stimulus intensity rMT(continuous) which also varied between subjects. Moreover, I wanted to control for the influence of the level of the outcome parameter during resting condition (LNmeanSICIr), measured at the same day in a different session.
The data matrix comprises of 3936 observations clustered in 64 groups, i.e. subjects, there are on average 10 trials per time zone per subject, but missing data exists. Here an example:
subject age hemisphere sex TIME SICI rMT TIMEST TIMEST2 LNmeanSICIr
1 A1_BB 62 1 1 1 17.45646 37 -5 25 3.325864
2 A1_BB 62 1 1 1 27.56828 37 -5 25 3.325864
3 A1_BB 62 1 1 1 32.41933 37 -5 25 3.325864
4 A1_BB 62 1 1 1 28.31985 37 -5 25 3.325864
5 A1_BB 62 1 1 1 NA 37 -5 25 3.325864
6 A1_BB 62 1 1 1 23.09312 37 -5 25 3.325864
I started with a model, which included all relevant factors and interactions and selected my final model by backward elimination of interactions and afterwards main factors, using -2LL and BIC criterion for model selection, in particular focussing on the random term.
Since in my sample, the variables SEX and HEMISPHERE were somehow manipulated, i.e. my subjects were chosen according to these features, I left these factors in the model although they turned out not to be significant.
Adding TIMEST and TIMEST2 to the random term highly improved the model.
Thus, my final model now would be the following:
mM1<- lme(lnSICIm~factor(sex)+factor(hemisphere)+age+TIMEST+TIMEST2+LNmeanSICIr+age*TIMEST+LNmeanSICIr*TIMEST,data=dta, random=~TIMEST+TIMEST2 | subject,na.action=na.exclude,method="REML")
Does this sound reasonable so far? I wonder whether using lme is really the right choice? Coming from a field where posthoc pairwise comparisons are the usual procedure, I wonder how to deal wether it would be an appropriate approach to compare age groups in a second step, since AGE itself as well as AGE*TIMEST turned out to be highly significant fixed effects.
2. using BLUPs from (1.) to add as fixed effect into second part of experiment (behavioural data)
The next step for me would be to see if the task-related modulation of my outcome parameter is explaining the variance in behavioural data, collected in another session. (also see above **)
Here, I got the advice to use the BLUPs from my final model (using the random.effects(mM1) command?
I read a lot on the use of BLUPs and got the impression, that some controversy exists regarding the use of BLUPs. I was not able to find out whether this approach applies to my data and question at all. Using the BLUPs, they could only be added as fixed factors into the behavioural model, otherwise I ran into a convergence problem.
Does an alternative approach exist to describe the "modulation" of the outcome variable over the time zones, i.e. obtaining a single variable per subject which characterizes the "modulatory capacity" in order to be used as a variable in the next step of analysis, to see if any relationship between physiology and behaviour exists.
I absolutely thank you very much in advance for your valuable time and thoughts!
University Medical Center Hamburg-Eppendorf
D-20246 Hamburg, Germany
kheise at uke.uni.de
-------------- next part --------------
Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG):
Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg
Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr. Alexander Kirstein, Joachim Pr?l?, Prof. Dr. Dr. Uwe Koch-Gromus
More information about the R-sig-mixed-models