[R-sig-ME] Mixed models for repeated measures with missing values

Wed Aug 9 17:53:05 CEST 2017

Instead of trying to determine the correct polynomial degree, why not
use a GAM and let the smoother do that work for you? If you absolutely
need to use a polynomial model, you can pick the degree via
cross-validation, but this is computationally expensive and only works
if you have enough data to be able to do cross-validation. For doing
this as a GAM, you can use either mgcv::gam() or gamm4::gamm().

Depending on how well that works, the heteroskedacity issue may solve
itself. But otherwise, I don't know if weights are really the correct
way to address that issue -- do you have a principled reason for saying
that some observations are more reliable / important than others? If
not, weights don't really make sense.

Finally, it is possible but a bit weird to have something as a fixed
effect and a grouping term in the random effects. Thierry Onkelinx has
posted an example online: https://rpubs.com/INBOstats/both_fixed_random
but again, I would see if using a GAMM makes all this moot.

Best,
Phillip

On 05/19/2017 12:06 PM, Hongmei Chen wrote:
> Dear mixed model users,
> 
> I have a question concerning using mixed models for repeated measures
> with missing values. I would like to test the effects of plant diversity
> (Div., continuous) on root growth rate (Y, continuous) over time (Time).
> However, time is not evenly spaced, thus I think it is better to use
> Time as a continuous term. In addition, the root growth rate - time
> relationship was not linear. Thus I used poly (Time, N) to account for
> the non-linear relationship.  Besides, the plots were randomly arranged
> in the 4 blocks, I would also like to account for the potential block
> effect.
> 
> I use lme from nlme for the data analyses.
> I first tested the polynomial order by increasing n from 2 to 3.
> However, AIC suggest that I should use 5 or 6. I am afraid I will over
> fit the data. thus I chose ploy(Time, 3).
> Mod1 <- lme (Y ~ Div * ploy(Time, 3), random = ~1|block/plot, method="ML")
> 
> Because of repeated measurement, I included e.g. correlation= corRatio
> (form=~Time, nugget=T) to account for the dependence. I tested different
> correlation structures and chose the one with lowest AIC.
> Mod2 <- lme (Y ~ Div * ploy(Time, 3), random = ~1|block/plot,
> method="REML", correlation=  corRatio (form=~Time, nugget=T))
> 
> After checking the residuals, I still could see some trend in time, and
> heterogeneity in residuals e.g. at different time. I further included
> "weights" argument in the model.
> Mod3 <- lme (Y ~ Div * ploy(Time, 3), random = ~1|block/plot,
> method="REML", correlation=  corRatio (form=~Time, nugget=T),
> weights=varIdent(form=~1|plot))
> 
> My questions are
> 1) Should I go for higher order in polynomial term? For example 4
> 2) For the random term, I used the simplest one. Would you consider
> other options e.g. random = ~ploy(Time, 3)|block/plot
> 3) Can I use time as a continuous term in fixed part but as a factor in
> the random part or in the weights argument  like:
> weights=varIdent(form=~1|as.factor(Time))
> 
> Because of the missing observation values, I could only used mixed
> models for my data. I have googled this is for quite a long time but
> could not find a good example or solution. Any suggestions are welcome.
> 
> Kind regards,
> Hongmei Chen
>