[R-sig-ME] Nested subject-longitudinal logit design
arun
smartpink111 at yahoo.com
Thu May 3 21:12:32 CEST 2012
Hi Ben,
Thanks for the quick response.
----- Original Message -----
From: Ben Bolker <bbolker at gmail.com>
To: r-sig-mixed-models at r-project.org
Cc:
Sent: Thursday, May 3, 2012 2:44 PM
Subject: Re: [R-sig-ME] Nested subject-longitudinal logit design
arun <smartpink111 at ...> writes:
> I posted recently regarding inputs for a longitudinal logit model
> using lme4.
[snip]
> ... I am measuring the binary response (1- insect present in light
> area, 0- insect present in dark area) as dependent variable, with
> fixed effects of Wavelength (3 levels) of light applied on insect in
> a petridish (half-covered with aluminium foil) for a period starting
> from 1 min to 20 min. Depending on the starting response (animal
> present in dark or light zone of petridish at 0 min), there is
> another factor (Start_Resp - 2 levels - L starting in light zone, D-
> starting in dark zone). It was really important to introduce this
> factor, as the response is drastically different in both levels of
> the factor for each of the wavelengths. Since, we are measuring
> the response every 1 min (0 or 1), for 20 min, time is also a factor
> with 20 levels or a covariate. In the present model, I introduce
> time as a covariate and extracted deviations for individual
> measurements as a random effect in "resid".
> BehavdatOrig$resid<-as.factor(1:dim(BehavdatOrig)[1])
I would say that factor(1:nrow(BehavdatOrig)) is slightly
more readable, but OK
[A] (fm<-lmer(Response~Wavelength*Start_Resp*time+(1|resid)+(1+time|Subject),
family=binomial,data=BehavdatOrig)
This seems reasonable
> If Wavelength, starting response are fixed effects and time as
> covariate, I think I can have an interaction term as
> Wavelength*Start_Resp*time. But, subjects are not repeated for the
> experiment.
I'm guessing this means that each individual is only measured
in a single level of Wavelength*Start_Resp ... ?
Yes, it is only measured once in a single level of Wavelength*start_resp for 20 minutes. After that, the animal is discarded.
> If I have column called "Rep" (replication) in the
> dataset for each treatment (another column - Treatment = Combination
> of Wavelength*Strain*Start_Resp) does it make sense to introduce
> Subject as nested within Treatment using the
> BehavdatOrig <- within(BehavdatOrig, Subject <- factor(Treatment:Rep))
How is Subject coded? i.e. is it coded 1..n_i for each Treatment:Rep
combination (explicit nesting), or is it coded 1..N for the entire
data set (implicit nesting)?
Similarly, how is Rep coded?
Subject is coded from 1..N (implicit nesting). Rep is coded from 1:n_i for each treatment combination
Number Wavelength Strain Start_Resp Treatment Rep Subject time Response
1 Red GAI L RedGAI_L 1 1 1 1
2 Red GAI L RedGAI_L 1 1 2 1
3 Red GAI L RedGAI_L 1 1 3 1
4 Red GAI L RedGAI_L 1 1 4 0
5 Red GAI L RedGAI_L 1 1 5 0
6 Red GAI L RedGAI_L 1 1 6 0
7 Red GAI L RedGAI_L 1 1 7 0
8 Red GAI L RedGAI_L 1 1 8 0
9 Red GAI L RedGAI_L 1 1 9 0
10 Red GAI L RedGAI_L 1 1 10 0
11 Red GAI L RedGAI_L 1 1 11 0
12 Red GAI L RedGAI_L 1 1 12 0
13 Red GAI L RedGAI_L 1 1 13 0
14 Red GAI L RedGAI_L 1 1 14 0
15 Red GAI L RedGAI_L 1 1 15 0
16 Red GAI L RedGAI_L 1 1 16 0
17 Red GAI L RedGAI_L 1 1 17 0
18 Red GAI L RedGAI_L 1 1 18 0
19 Red GAI L RedGAI_L 1 1 19 0
20 Red GAI L RedGAI_L 1 1 20 0
21 Red GAI L RedGAI_L 2 2 1 1
22 Red GAI L RedGAI_L 2 2 2 1 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1921 Green GAI D Green_GAI_D 21 97 1 0
1922 Green GAI D Green_GAI_D 21 97 2 0
1923 Green GAI D Green_GAI_D 21 97 3 0
1924 Green GAI D Green_GAI_D 21 97 4 0
1925 Green GAI D Green_GAI_D 21 97 5 0
1926 Green GAI D Green_GAI_D 21 97 6 0
1927 Green GAI D Green_GAI_D 21 97 7 0
1928 Green GAI D Green_GAI_D 21 97 8 0
Assuming for the moment that Subject is implicitly nested and Rep is
explicitly nested, and that there is more than one Subject per Rep
(and that Rep is not crossed with Subject, i.e. each Subject is
measured only within a single Rep), then you should use something like
BehavdatOrig$RepNest <- with(BehavdatOrig,interaction(Treatment,Rep))
(1|resid) + (1+time|Subject) + (1|RepNest)
This allows for variation in intercept and slope across Subject, and
intercept (only) across RepNest.
It makes sense.
This specification would also be correct if Subject *were* crossed
with RepNest and numbered appropriately (i.e. 1..n for each level of
RepNest). The only problem is if it is explicitly nested, in which
case you need (1+time|Subject:RepNest)
This is not the case.
> It is an unbalanced dataset with respect to number of replications.
> I made a mistake in my previous analysis as I used Subject as
> replicates which gave me completely different results. The shrink
> fit model graph for each subject looks to have small deviations from
> the population. In this scenario, should I have to replace this
> model with quasibinomial model.
I don't know what the "shrink fit model graph" is ...
I got the example graph from Douglas Bates lme4 chapter4.R (sleepstudy dataset). I used the same settings, only changed the coefficient list to be used.
> Quasibinomial model with cbind( sum of response for every 5 minute,
> and 5-response) with this new settings. For one thing, there is no
> quasibinomial link function in lmer, still I used binomial link to
> get results. Another problem with quasibinomial is that I am not
> able to get the shrink fit graph with quasibinomal response. Should
> be a problem with lattice graph plots. Any advice on this direction
> will be appreicated.
Not sure what's going on here. Are you using
glmmPQL(...,family="quasibinomial")
No, I was using lmer with binomial link.
(which is the only way I know of to fit quasibinomial GLMMs in R?)
Your inclusion of the 'resid' random effect above should have
taken care of overdispersion.
Let me compare the results from lmer and glmmPQL.
Thank you
A.K.
# The above fm model provides output as:
# Formula: Response ~ Wavelength * Start_Resp * time +
## (1 | resid) + (1 + time | Subject)
# Data: BehavdatOrig
# AIC BIC logLik deviance
# 1323 1413 -645.7 1291
# Random effects:
# Groups Name Variance Std.Dev. Corr
# resid (Intercept) 8.3103e-12 2.8828e-06
# Subject (Intercept) 1.6026e+01 4.0033e+00
# time 1.2760e-01 3.5722e-01 -0.552
# Number of obs: 1960, groups: resid, 1960; Subject, 98
#
# I also tried uncorrelated model fma.
#
# Formula: Response ~ Wavelength * Start_Resp * time +
## (1 | resid) +
## (1 | Subject) + (0 + time |
# Subject)
# Data: BehavdatOrig
# AIC BIC logLik deviance
# 1333 1417 -651.6 1303
# Random effects:
# Groups Name Variance Std.Dev.
# resid (Intercept) 6.9660e-14 2.6393e-07
# Subject time 1.0964e-01 3.3112e-01
# Subject (Intercept) 1.1732e+01 3.4252e+00
# et results:
#
# Anova comparison favors the correlated model:
# > anova(fma,fm)
# Data: BehavdatOrig
# Models:
# fma: Response ~ Wavelength * Start_Resp * time + (1 | resid) + (1 |
# fma: Subject) + (0 + time | Subject)
# fm: Response ~ Wavelength * Start_Resp * time + (1 | resid) + (1 +
# fm: time | Subject)
# Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
# fma 15 1333.2 1416.9 -651.57
# fm 16 1323.4 1412.7 -645.69 11.767 1 0.0006031 ***
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list