[R-sig-ME] Using lmer in survival

Flores-de-Gracia, Eric eef201 at exeter.ac.uk
Sun Jan 22 23:38:35 CET 2012


Hi there

I´m trying to fit a lmer model to a data on survival (1=death, 0=alive) of artificial frog models deployed in the field. At this early point want to know if "size" has an general effect on survival

I have found difficult to understand the model goodness of fit using the model simplification. Using a randomized block design, my response variable is "surv" and my predictor is "size".

model1<-lmer(surv~size+(1|block/size),family="binomial")
summary(model1)

model2<-update(model1,~.-size)

anova(model1,model2)

But the plot of residuals against the predicted values looks quite strange, perhaps due to the huge amount of "0" in the dataset! (562 ceros vs 35 ones).

Is there any other error structure that can be implemented to deal with the huge amount of ceros?

Any suggestion welcome, thanks.
 
Eric Flores De Gracia
School of Biosciences
University of Exeter, Cornwall Campus
Penryn, Cornwall
TR10 9EZ
United Kingdom
Mobilephone: +044 07578724705
http://biosciences.exeter.ac.uk/staff/postgradresearch/ericflores/
________________________________________
De: r-sig-mixed-models-bounces at r-project.org [r-sig-mixed-models-bounces at r-project.org] En nombre de Antoine Tremblay [trea26 at gmail.com]
Enviado el: domingo, 22 de enero de 2012 18:13
Para: r-sig-mixed-models at r-project.org
CC: r-sig-mixed-models-request at r-project.org
Asunto: Re: [R-sig-ME] lme or lmer?

Giorgio, another thing regarding the Interval variable that came to my
attention after reading Ben Bolker's reply to your question.

Were you planning on adding Interval to your model specification?
Because it *shouldn't* be in the model. From what I can see, you'd be
performing an analysis on the 200-300 time window, then another one on
the 400-600 time window, and so forth, and in each analysis you'd
include all electrodes (which, as I mentioned in my reply, is probably
best re-coded as ROI with 9 levels).

I'm assuming you chose these time windows because you were expecting to
find amplitude differences between your two conditions with respect to
ERPs found in these time windows (e.g., maybe a late N400 in the 400-600
window?).

So, for *each* time window the model would look something like this
(after recoding Electrode to ROI and your datya frame is named eeg):

m1 <- lmer(Ampl ~ Condition * ROI + (1|Subject) + (1|Item) +
(ROI|Subject), data = eeg)

Would check model assumptions using e.g., function mcp.fnc from package
LMERConvenienceFunctions.

mcp.fnc(m1, trim = 2.5)

Typically the qq plot top right isn't good and there are a lot of data
points with undue influence (bottom right panel). But after removing
outliers +/- 2.5 std. dev. below and above the m1 residuals mean (could
use function romr.fnc from package LMERConvenienceFunctions) the plots
look much, much better.

eeg<- romr.fnc(model = m1, data = eeg, trim = 2.5)$data

Typically 1.5% to 2.5% of the data is removed this way (the function
tells you the percentage). Then refit the model:

m1 <- update(m1)

Then look at the model criticism plot:

mcp.fnc(m1, trim = 2.5)


In addition, you could start with a simple model with only (1|Subject)
then fit a more complex model with (1|Subject) + (1|Item) then a more
complex one with (1|Subject) + (1|Item) + (ROI|Subject) ... and test
whether the inclusion of the random effects is warranted using a log
likelihood ratio test (anova(m1,m2) ...). I'd bet that (1|Subject) +
(1|Item) + (ROI|Subject) will be warranted, maybe also (Condition|Subject).

Sincerely,

Antoine Tremblay
NeuroCognitive Imaging Laboratory
Dalhousie university
Halifax, Canada

> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 21 Jan 2012 15:54:00 +0000 (UTC)
> From: Ben Bolker<bbolker at gmail.com>
> To: r-sig-mixed-models at r-project.org
> Subject: Re: [R-sig-ME] lme or lmer?
> Message-ID:<loom.20120121T041825-883 at post.gmane.org>
> Content-Type: text/plain; charset=us-ascii
>
> Giorgio Arcara<giorgio.arcara at ...>  writes:
>
>> I would like to use mixed models in R to analyze EEG data, but I don't
>> know if it is more correct to use lme or lmer.
>> My data have the following structure
>>
>> Subject      Electrode        Interval       Trial   Condition       Ampl
>> 1            Fp1             200-300         1       A               3.5
>> 1            Fp1             200-300         2       B               4.2
>> 1            Fp2             400-600         1       A               6.5
>> 1            Fp2             400-600         2       B               3.3
>> 2            Fp1             200-300         1       A               2.1
>> 2            Fp1             200-300         2       B               -5.4
>> 2            Fp2             400-600         1       A               -5.6
>> 2            Fp2             400-600         2       B               -3.2
>
>   [snip]
>
>    In general if you *can* use either lme or lmer (i.e. the random effects are
> not nested, there is no heteroscedasticity or correlation structure at the
> level of residuals ...) then either is OK.  I generally recommend lme in
> these cases (with apologies to Doug Bates) because it is better documented,
> although it's slower.
>
>> For sake of simplicity, here I include only 2 Trials but in the real
>> dataset they are many more.
>> In this hypothetical dataset Ampl is the depentent variable. Electrode
>> and Interval are two predictors. I expect that levels of Electrode
>> will be highly correlated as well the levels of Interval.
>
>    ?? Meaning that most instances of FP1 have Interval 200-300, most
> of Fp2 have 400-500, and there are few 'crossover' instances where you
> have FP1 with 400-500 and FP2 with 200-300?  This will in general make
> it hard to estimate to distinguish the two effects ... but presumably
> you know that.
>
>> My goal is to study if Condition influence Ampl and if interact with
>> Electrode variable and Interval Variable.
>>
>> I would fit a model on these data with lmer with the following structure
>>
>> mod=lmer(Ampl~Electrode*Interval*Condition+(1+Electrode|Subject)
>> +(1+Interval|Subject))
>> If I'm correct the corresponding lme model would be
>>
>> mod=lme(Ampl~Electrode*Interval*Condition, random=list(~1+condition|
>> Subject, ~1+Interval|Subject))
>
>    Why not (Electrode+Interval|Subject)?  That (a) avoids a duplicated intercept
> term and (b) estimates correlations between the intercept, electrode, and
> Interval RE.
>
>> Any suggestion for covariance matrix specification in lme?
>
>    Do you need one other than the default -- i.e. do you expect
> any particular correlation structure?
>
>
>
> ------------------------------
>
> Message: 2
> Date: Sat, 21 Jan 2012 12:55:50 -0400
> From: Antoine Tremblay<trea26 at gmail.com>
> To: r-sig-mixed-models at r-project.org
> Cc: r-sig-mixed-models-request at r-project.org
> Subject: Re: [R-sig-ME] lme or lmer?
> Message-ID:<4F1AEE16.3090801 at gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hey Giorgio!
>
> I have one published paper in JoCN where we analyzed ERP data using
> LMER, another one is on the way, and a methods paper is almost finished
> and submitted to NeuroImage regarding the addition of by-item random
> intercepts and slopes to the model. I'm sending you a copy of the JoCN
> paper (proofs for now, it'll be officially out any time soon) and a copy
> of the almost finished NeuroImage paper. If others are interested,
> please let me know and I can send them to you (trea26 at gmail dot com).
>
> So LMER-wise, adding (1+Electtrode|Subject), which you could simply
> write (Electrode|Subject) is correct, but fitting a model with such a
> random effect will take forever.
>
> In NewmanTremblay2011 paper we collapsed Electrode into 9 ROIs: left
> anterior, midline anterior, right anterior, left central, midline
> central, right central, left posterior, midline posterior, and right
> posterior and use that new variable (ROI) instead of Electrode. Note
> that we are not averaging at all! The main reason for this is
> computation time: With 9 levels of ROI in the model as (ROI|Subject) it
> takes A LOT of time to fit, tried once with (Electrode|Subject) and
> would run for sooooo long, actually killed it after like two or three
> days. Even the model with ROI takes what seems to be forever.
>
> Then, once fitted, you'll see with print(model) that in the random
> effects portion of the summary there's a table of correlations between
> levels of ROI, something like this from one of our models (incomplete):
>
> Linear mixed model fit by REML
> Formula: Ampl ~ rProficiency * Condition * ROI * Group + (Condition |
>    Subject) + (1 | Subject) + (ROI | Subject)
>      Data: dat
>        AIC     BIC  logLik deviance REMLdev
>    1011929 1013147 -505843  1011340 1011685
> Random effects:
>    Groups   Name          Variance   Std.Dev. Corr
>    Subject  (Intercept)   2.5852e-01 0.508447
>             ConditionGood 1.4045e+00 1.185133 -0.681
>    Subject  (Intercept)   1.8638e-04 0.013652
>    Subject  (Intercept)   2.8105e+00 1.676449
>             ROILcent      9.4985e-01 0.974604 -0.423
>             ROILpost      3.2123e+00 1.792281 -0.644  0.833
>             ROIMant       3.3728e-01 0.580755 -0.240  0.150  0.258
>             ROIMcent      1.8586e+00 1.363301 -0.249  0.687  ...
>             ROIMpost      2.8172e+00 1.678445 -0.577  0.746  ...
>             ROIRant       7.5682e-01 0.869957 -0.421 -0.171  ...
>             ROIRcent      1.4031e+00 1.184519 -0.840  0.531  ...
>             ROIRpost      3.0983e+00 1.760201 -0.797  0.578  ...
>    Residual               3.3207e+01 5.762530
> Number of obs: 159374, groups: Subject, 44
>
> I'm attaching a complete report (using Sweave) for your reference.
>
> There's one other paper where they used lme with a corSpher function in
> Davidson2007 (attached here, see page 90). The problem i see with using
> lme is that you can't really add crossed by-item random effect, which
> you should as demonstrated in Tremblay2012. Basically, adding by-item
> random effects substantially decreases the amount of (partial)
> autocorrelation in the model residuals (i.e., better approximation of
> the assumption of independence of errors. Note that the data used in
> that paper is available on CRAN (data package LCFdata) and I can send
> you the .Rnw file that contains the R code used for data manipulation,
> analysis and plotting in Tremblay2012, so it's fully replicable. Note
> that for simplicity, analyses in Tremblay2012 are on a single electrode,
> and that I'm not comparing (ROI|Subject) in LMER to corSpher in LME.
> Would like to eventually look a variograms to see how well the two take
> care of spatial correlation and also compare them with a model that
> doesn't account for this correlation.
>
> Note that, as I demonstrate in Tremblay2012, by-item random effect
> should be added if warranted (by Log-likelihood Ratio Test or other; I
> suspect it will always be).
>
>
> Cheers,
>
> Antoine
>
>
> End of R-sig-mixed-models Digest, Vol 61, Issue 31
> **************************************************
>

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models




More information about the R-sig-mixed-models mailing list