[R-sig-ME] lmer formula syntax?
Gavin Simpson
gavin.simpson at ucl.ac.uk
Mon Mar 7 10:45:24 CET 2011
On Tue, 2011-02-01 at 14:14 -0800, Colin Wahl wrote:
> Dear r-sig-mixed-models List,
>
> I've been digging through the r-help archive and various books in search of
> an answer to my query. I have been thus far unsuccessful. I am not new to R,
> but have primarily focused on multivariate non parametric analyses in the
> past; my experience with lmer has been confined to the last few weeks. The
> model formulas I have written work and I have some idea of which are
> accurate/best based on AIC scores, but I would very much appreciate an
> informed opinion on my syntax.
>
> I am performing an ecological study of stream health using %EPT (An
> aggregate relative abundance variable for invertebrates) as my response
> variable. I am interested in how invertebrate counts vary between different
> watershed types (forested, cultivated, developed, grassland) and reach types
> (forested, non forested). There are a total of 12 streams, 12 watersheds and
> 24 reaches. The wshed factor is unbalanced.
Hi Colin,
I have very little to add to the discussion thus far (from which I have
learned a lot by the way!), except to question the use of family =
gaussian for this response? %EPT will be bounded at zero (there will be
samples for locations that are so polluted as to not have any of these
critters living there), and potentially bounded at 100% (theoretically
at least, although I find it hard to see how one would get 100% of this
group of inverts in a kick sample). The errors may depart from the
assumptions of the LMM.
If you know the total counts then perhaps a binomial GLMM would be
appropriate. If not, beta regression via the betareg package might be an
alternative - the sandwich package can be used to fix up the covariance
matrix to account for the clustering in the data.
Just my tuppence,
G
> Data structure:
> > str(ept)
> 'data.frame': 72 obs. of 4 variables:
> $ wshed : Factor w/ 4 levels "c","d","f","g": 1 1 1 1 1 1 1 1 1 1 ...
> $ stream : Factor w/ 12 levels "BA","D1","D2",..: 4 4 4 6 6 6 10 10 10 4
> ...
> $ riparia: Factor w/ 2 levels "F","N": 1 1 1 1 1 1 1 1 1 2 ...
> $ ept : num 0.318 0.156 0.194 0.146 0.158 ...
>
> stream is the only random variable and it is nested (not implicitly) in
> wshed. My standard statistical model is as follows:
> ept(ijkl) = + wshedi + stream(i)j + ripariak + wshed*ripariaik +
> stream*riparia(i)jk + (ijk)l
>
> The best option I have come up with thus far is:
> >model<-lmer(ept ~ wshed + riparia + wshed:riparia + (1| wshed/stream) + (1|
> stream:riparia), data=ept, family="gaussian")
> or alternatively:
> >model<-lmer(ept ~ wshed*riparia + (1| wshed/stream) + (1| stream:riparia),
> data=ept, family="gaussian")
>
> Question: Am I specifying the random terms and nesting structure correctly?
>
> Secondarily: I have read a few times that "wshed/stream" is interchangeable
> with "wshed:stream" which is a meaningless interaction. Also I've seen that
> random effects are specified as (a|b) where a is a covariate and b is a
> grouping factor. Does having 1 as a covariate simply specifying an intercept
> of 1? What is the purpose of placing a factor in the place of 1 as a
> covariate? OR is there a nice complete summary tutorial that I've missed.
>
>
> I'm looking forward to hearing any comments.
>
> Thank you,
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-sig-mixed-models
mailing list