[R-sig-ME] Nested subject-longitudinal logit design

Thu May 3 20:44:23 CEST 2012

arun <smartpink111 at ...> writes:

> I posted recently regarding inputs for a longitudinal logit model
> using lme4.  

[snip]

> ... I am measuring the binary response (1- insect present in light
> area, 0- insect present in dark area) as dependent variable, with
> fixed effects of Wavelength (3 levels) of light applied on insect in
> a petridish (half-covered with aluminium foil) for a period starting
> from 1 min to 20 min.   Depending on the starting response (animal
> present in dark or light zone of petridish at 0 min), there is
> another factor (Start_Resp - 2 levels - L starting in light zone, D-
> starting in dark zone).  It was really important to introduce this
> factor, as the response is drastically different in both levels of
> the factor for each of the wavelengths.   Since, we are measuring
> the response every 1 min (0 or 1), for 20 min, time is also a factor
> with 20 levels or a covariate.   In the present model, I introduce
> time as a covariate and extracted deviations for individual
> measurements as a random effect in "resid".

> BehavdatOrig$resid<-as.factor(1:dim(BehavdatOrig)[1])

  I would say that factor(1:nrow(BehavdatOrig)) is slightly
more readable, but OK

[A] (fm<-lmer(Response~Wavelength*Start_Resp*time+(1|resid)+(1+time|Subject), 
     family=binomial,data=BehavdatOrig)

This seems reasonable

> If Wavelength, starting response are fixed effects and time as
> covariate, I think I can have an interaction term as
> Wavelength*Start_Resp*time.  But, subjects are not repeated for the
> experiment.  

I'm guessing this means that each individual is only measured
in a single level of Wavelength*Start_Resp ... ?

> If I have column called "Rep" (replication) in the
> dataset for each treatment (another column - Treatment = Combination
> of Wavelength*Strain*Start_Resp) does it make sense to introduce
> Subject as nested within Treatment using the 

 > BehavdatOrig <- within(BehavdatOrig, Subject <- factor(Treatment:Rep)) 

How is Subject coded?  i.e. is it coded 1..n_i for each Treatment:Rep
combination (explicit nesting), or is it coded 1..N for the entire
data set (implicit nesting)?

Similarly, how is Rep coded?

Assuming for the moment that Subject is implicitly nested and Rep is
explicitly nested, and that there is more than one Subject per Rep
(and that Rep is not crossed with Subject, i.e. each Subject is
measured only within a single Rep), then you should use something like

BehavdatOrig$RepNest <- with(BehavdatOrig,interaction(Treatment,Rep))

(1|resid) + (1+time|Subject) + (1|RepNest)

This allows for variation in intercept and slope across Subject, and
intercept (only) across RepNest.

  This specification would also be correct if Subject *were* crossed
with RepNest and numbered appropriately (i.e. 1..n for each level of
RepNest).  The only problem is if it is explicitly nested, in which
case you need (1+time|Subject:RepNest)

> It is an unbalanced dataset with respect to number of replications. 
> I made a mistake in my previous analysis as I used Subject as
> replicates which gave me completely different results.  The shrink
> fit model graph for each subject looks to have small deviations from
> the population.  In this scenario, should I have to replace this
> model with quasibinomial model.

  I don't know what the "shrink fit model graph" is ...

>  Quasibinomial model with cbind( sum of response for every 5 minute,
> and 5-response) with this new settings.  For one thing, there is no
> quasibinomial link function in lmer, still I used binomial link to
> get results.  Another problem with quasibinomial is that I am not
> able to get the shrink fit graph with quasibinomal response.  Should
> be a problem with lattice graph plots.  Any advice on this direction
> will be appreicated.

   Not sure what's going on here.  Are you using

glmmPQL(...,family="quasibinomial")

(which is the only way I know of to fit quasibinomial GLMMs in R?)

Your inclusion of the 'resid' random effect above should have
taken care of overdispersion.

# The above fm model provides output as:
# Formula: Response ~ Wavelength * Start_Resp * time + 
## (1 | resid) + (1 + time | Subject) 
#    Data: BehavdatOrig 
#   AIC  BIC logLik deviance
#  1323 1413 -645.7     1291
# Random effects:
#  Groups  Name        Variance   Std.Dev.   Corr   
#  resid   (Intercept) 8.3103e-12 2.8828e-06        
#  Subject (Intercept) 1.6026e+01 4.0033e+00        
#          time        1.2760e-01 3.5722e-01 -0.552 
# Number of obs: 1960, groups: resid, 1960; Subject, 98
# 
# I also tried uncorrelated model fma. 
# 
# Formula: Response ~ Wavelength * Start_Resp * time + 
## (1 | resid) + 
## (1 |     Subject) + (0 + time |
# Subject) 
#    Data: BehavdatOrig 
#   AIC  BIC logLik deviance
#  1333 1417 -651.6     1303
# Random effects:
#  Groups  Name        Variance   Std.Dev.  
#  resid   (Intercept) 6.9660e-14 2.6393e-07
#  Subject time        1.0964e-01 3.3112e-01
#  Subject (Intercept) 1.1732e+01 3.4252e+00
# et results:
# 
# Anova comparison favors the correlated model:
# > anova(fma,fm)
# Data: BehavdatOrig
# Models:
# fma: Response ~ Wavelength * Start_Resp * time + (1 | resid) + (1 | 
# fma:     Subject) + (0 + time | Subject)
# fm: Response ~ Wavelength * Start_Resp * time + (1 | resid) + (1 + 
# fm:     time | Subject)
#     Df    AIC    BIC  logLik  Chisq Chi Df Pr(>Chisq)    
# fma 15 1333.2 1416.9 -651.57                             
# fm  16 1323.4 1412.7 -645.69 11.767      1  0.0006031 ***