[R-sig-ME] lmer formula syntax?
bbolker at gmail.com
Tue Mar 8 01:53:23 CET 2011
-----BEGIN PGP SIGNED MESSAGE-----
On 11-03-07 06:42 PM, Colin Wahl wrote:
> Thanks for the comment. I have since come to the understanding that my data
> does indeed have a binomial distribution. It is proportional data. It was a
> little weird in that I had to create an aggregate variable from the
> successes (#of EPT per site) and failures (number of invertebrates not EPT)
> using the following code.
> E<-cbind(eptCNT, n-eptCNT)
> my current model is of the form:
> modelB<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip), data=ept,
> Thanks for the heads up, anyway, I appreciate it. I've been dealing with
> overdispersion in another thread, which has become quite the goose chase!
I believe, if it makes more sense to you, that you can also specify
successes as a proportion (ie. eptCNT/n) as long as you use the
'weights' argument to specify the denominator size.
From the ?lmer examples (with a new bit added which will eventually
be incorporated in the package):
cbpp$obs <- 1:nrow(cbpp)
(gm2 <- glmer(cbind(incidence, size - incidence) ~ period +
(1 | herd) + (1|obs),
family = binomial, data = cbpp))
## same model, specified with proportion on LHS of formula
## and using weights to specify denominator
gm2W <- glmer(incidence/size ~ period +
(1 | herd) + (1|obs),
family = binomial,
weights=size, data = cbpp)
> On Mon, Mar 7, 2011 at 1:45 AM, Gavin Simpson <gavin.simpson at ucl.ac.uk>wrote:
>> On Tue, 2011-02-01 at 14:14 -0800, Colin Wahl wrote:
>>> Dear r-sig-mixed-models List,
>>> I've been digging through the r-help archive and various books in search
>>> an answer to my query. I have been thus far unsuccessful. I am not new to
>>> but have primarily focused on multivariate non parametric analyses in the
>>> past; my experience with lmer has been confined to the last few weeks.
>>> model formulas I have written work and I have some idea of which are
>>> accurate/best based on AIC scores, but I would very much appreciate an
>>> informed opinion on my syntax.
>>> I am performing an ecological study of stream health using %EPT (An
>>> aggregate relative abundance variable for invertebrates) as my response
>>> variable. I am interested in how invertebrate counts vary between
>>> watershed types (forested, cultivated, developed, grassland) and reach
>>> (forested, non forested). There are a total of 12 streams, 12 watersheds
>>> 24 reaches. The wshed factor is unbalanced.
>> Hi Colin,
>> I have very little to add to the discussion thus far (from which I have
>> learned a lot by the way!), except to question the use of family =
>> gaussian for this response? %EPT will be bounded at zero (there will be
>> samples for locations that are so polluted as to not have any of these
>> critters living there), and potentially bounded at 100% (theoretically
>> at least, although I find it hard to see how one would get 100% of this
>> group of inverts in a kick sample). The errors may depart from the
>> assumptions of the LMM.
>> If you know the total counts then perhaps a binomial GLMM would be
>> appropriate. If not, beta regression via the betareg package might be an
>> alternative - the sandwich package can be used to fix up the covariance
>> matrix to account for the clustering in the data.
>> Just my tuppence,
>>> Data structure:
>>> 'data.frame': 72 obs. of 4 variables:
>>> $ wshed : Factor w/ 4 levels "c","d","f","g": 1 1 1 1 1 1 1 1 1 1 ...
>>> $ stream : Factor w/ 12 levels "BA","D1","D2",..: 4 4 4 6 6 6 10 10 10 4
>>> $ riparia: Factor w/ 2 levels "F","N": 1 1 1 1 1 1 1 1 1 2 ...
>>> $ ept : num 0.318 0.156 0.194 0.146 0.158 ...
>>> stream is the only random variable and it is nested (not implicitly) in
>>> wshed. My standard statistical model is as follows:
>>> ept(ijkl) = + wshedi + stream(i)j + ripariak + wshed*ripariaik +
>>> stream*riparia(i)jk + (ijk)l
>>> The best option I have come up with thus far is:
>>>> model<-lmer(ept ~ wshed + riparia + wshed:riparia + (1| wshed/stream) +
>>> stream:riparia), data=ept, family="gaussian")
>>> or alternatively:
>>>> model<-lmer(ept ~ wshed*riparia + (1| wshed/stream) + (1|
>>> data=ept, family="gaussian")
>>> Question: Am I specifying the random terms and nesting structure
>>> Secondarily: I have read a few times that "wshed/stream" is
>>> with "wshed:stream" which is a meaningless interaction. Also I've seen
>>> random effects are specified as (a|b) where a is a covariate and b is a
>>> grouping factor. Does having 1 as a covariate simply specifying an
>>> of 1? What is the purpose of placing a factor in the place of 1 as a
>>> covariate? OR is there a nice complete summary tutorial that I've missed.
>>> I'm looking forward to hearing any comments.
>>> Thank you,
>>> R-sig-mixed-models at r-project.org mailing list
>> Dr. Gavin Simpson [t] +44 (0)20 7679 0522
>> ECRC, UCL Geography, [f] +44 (0)20 7679 0565
>> Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
>> Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
>> UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
-----END PGP SIGNATURE-----
More information about the R-sig-mixed-models