[R-sig-ME] lmer formula syntax?

Tue Mar 8 01:53:23 CET 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11-03-07 06:42 PM, Colin Wahl wrote:
> Gavin,
> Thanks for the comment. I have since come to the understanding that my data
> does indeed have a binomial distribution. It is proportional data. It was a
> little weird in that I had to create an aggregate variable from the
> successes (#of EPT per site) and failures (number of invertebrates not EPT)
> using the following code.
> E<-cbind(eptCNT, n-eptCNT)
> 
> my current model is of the form:
> modelB<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip), data=ept,
> family=binomial(link="logit"))
> 
> Thanks for the heads up, anyway, I appreciate it. I've been dealing with
> overdispersion in another thread, which has become quite the goose chase!
> 
> Colin
> 

   Colin:

    I believe, if it makes more sense to you, that you can also specify
successes as a proportion (ie. eptCNT/n) as long as you use the
'weights' argument to specify the denominator size.

   From the ?lmer examples (with a new bit added which will eventually
be incorporated in the package):

library(lme4)
cbpp$obs <- 1:nrow(cbpp)
(gm2 <- glmer(cbind(incidence, size - incidence) ~ period +
    (1 | herd) +  (1|obs),
              family = binomial, data = cbpp))

## same model, specified with proportion on LHS of formula
##   and using weights to specify denominator
gm2W <- glmer(incidence/size ~ period +
    (1 | herd) +  (1|obs),
              family = binomial,
              weights=size, data = cbpp)

all.equal(coef(gm2),coef(gm2W))


> 
> On Mon, Mar 7, 2011 at 1:45 AM, Gavin Simpson <gavin.simpson at ucl.ac.uk>wrote:
> 
>> On Tue, 2011-02-01 at 14:14 -0800, Colin Wahl wrote:
>>> Dear r-sig-mixed-models List,
>>>
>>> I've been digging through the r-help archive and various books in search
>> of
>>> an answer to my query. I have been thus far unsuccessful. I am not new to
>> R,
>>> but have primarily focused on multivariate non parametric analyses in the
>>> past; my experience with lmer has been confined to the last few weeks.
>> The
>>> model formulas I have written work and I have some idea of which are
>>> accurate/best based on AIC scores, but I would very much appreciate an
>>> informed opinion on my syntax.
>>>
>>> I am performing an ecological study of stream health using %EPT (An
>>> aggregate relative abundance variable for invertebrates) as my response
>>> variable. I am interested in how invertebrate counts vary between
>> different
>>> watershed types (forested, cultivated, developed, grassland) and reach
>> types
>>> (forested, non forested). There are a total of 12 streams, 12 watersheds
>> and
>>> 24 reaches. The wshed factor is unbalanced.
>>
>> Hi Colin,
>>
>> I have very little to add to the discussion thus far (from which I have
>> learned a lot by the way!), except to question the use of family =
>> gaussian for this response? %EPT will be bounded at zero (there will be
>> samples for locations that are so polluted as to not have any of these
>> critters living there), and potentially bounded at 100% (theoretically
>> at least, although I find it hard to see how one would get 100% of this
>> group of inverts in a kick sample). The errors may depart from the
>> assumptions of the LMM.
>>
>> If you know the total counts then perhaps a binomial GLMM would be
>> appropriate. If not, beta regression via the betareg package might be an
>> alternative - the sandwich package can be used to  fix up the covariance
>> matrix to account for the clustering in the data.
>>
>> Just my tuppence,
>>
>> G
>>
>>> Data structure:
>>>> str(ept)
>>> 'data.frame':    72 obs. of  4 variables:
>>>  $ wshed  : Factor w/ 4 levels "c","d","f","g": 1 1 1 1 1 1 1 1 1 1 ...
>>>  $ stream : Factor w/ 12 levels "BA","D1","D2",..: 4 4 4 6 6 6 10 10 10 4
>>> ...
>>>  $ riparia: Factor w/ 2 levels "F","N": 1 1 1 1 1 1 1 1 1 2 ...
>>>  $ ept    : num  0.318 0.156 0.194 0.146 0.158 ...
>>>
>>> stream is the only random variable and it is nested (not implicitly) in
>>> wshed. My standard statistical model is as follows:
>>> ept(ijkl) =  + wshedi  + stream(i)j + ripariak + wshed*ripariaik +
>>> stream*riparia(i)jk + (ijk)l
>>>
>>> The best option I have come up with thus far is:
>>>> model<-lmer(ept ~ wshed + riparia + wshed:riparia + (1| wshed/stream) +
>> (1|
>>> stream:riparia), data=ept, family="gaussian")
>>> or alternatively:
>>>> model<-lmer(ept ~ wshed*riparia + (1| wshed/stream) + (1|
>> stream:riparia),
>>> data=ept, family="gaussian")
>>>
>>> Question: Am I specifying the random terms and nesting structure
>> correctly?
>>>
>>> Secondarily: I have read a few times that "wshed/stream" is
>> interchangeable
>>> with  "wshed:stream" which is a meaningless interaction. Also I've seen
>> that
>>> random effects are specified as (a|b) where a is a covariate and b is a
>>> grouping factor. Does having 1 as a covariate simply specifying an
>> intercept
>>> of 1? What is the purpose of placing a factor in the place of 1 as a
>>> covariate? OR is there a nice complete summary tutorial that I've missed.
>>>
>>>
>>> I'm looking forward to hearing any comments.
>>>
>>> Thank you,
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>> --
>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>>  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>>  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
>>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>>
>>
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk11fgIACgkQc5UpGjwzenOriACdFxOI4VSNc2XbGBVJIdEk6TiD
Q/UAnRrqEbNBklgY71ciaudcnGeQtAYo
=prvZ
-----END PGP SIGNATURE-----