[R-sig-ME] A sledgehammer to crack a nut?

Tue Sep 13 17:00:34 CEST 2016

Dear Quentin,

Since your response variable contains counts, you can't use ANOVA which
assumes residuals with a Gaussian distribution.

Year is conceptually a random effect. But with only two levels you get into
numerical problem. Hence it is better to add it to the fixed effects.

So I'd go for

glmer(response ~ E1 * E2 + E3 + (1|R1), data, family = poisson)
glmer.nb(response ~ E1 * E2 + E3 + (1|R1), data)

Note that  E1 * E2 * E3 if much more complex than  E1 * E2 + (1|E3) in
terms of model fit.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2016-09-13 16:00 GMT+02:00 John Sorkin <jsorkin op grecc.umaryland.edu>:

> Quentin,
>
> A general comment.
>
> Accounting for repeated measures taken from the same observational unit
> is needed only when three or more measurements
> have been obtained. When there are only two measurements one can either
> model change (i.e. post-pre) or post alone without
> any use of repeated measures theory or software. In fact, if one uses
> repeated measures ANOVA when only two measurements,
> the analysis "devolves" into a non-repeated measures analysis. When we
> wish to model two measurements the model can be
> specified in many ways including:
> change (post-pre) = group
> change =group + pre
> post = group (this should be used we care as it assumes that the pre
> value is the same in all experimental groups)
> post = group + pre
>
> You will note that all the models listed above have at most single value
> of the outcome of interest on the right side
> of the equals sign, further there is no indication of time the
> observation was obtained on the right side of the equals
> sign. If you need to have two or more values of the outcome of interest
> on the right side of the equals sign, and thus
> need a variable to indicate the time at which the observation was
> obtained, you need to use repeated measures techniques
> and repeated measures analyses. For example if there are three
> measurements obtained from each observational unit,
> you would need a model something like the following:
> value = group + time, where time might equal 0,  1, and 2.
>
> John
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and
> Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> >>> "Quentin Schorpp" <quentin.schorpp op thuenen.de> 09/13/16 9:18 AM >>>
> Hello,
>
> I have trouble with the term "repeated measurements" since I started to
> use statistics. During my time as a scientist I never saw an experiment
> where time-repeated measurements are NOT involved. Normally there are
> either before/after measurements, time-rows to investigate a development
> of a measurement variable or the repetition of a certain investigation
> in
> consecutive years. Therefore I'm already wondering why most people start
> learning basic statistics and repeated measurement is always declared as
> the "hard stuff" for self training in the future.
>
> Now I’ve got data from an experiment repeatedly conducted in 2
> consecutive
> years.
>
> The measurements are from trees, there are five trees exposed/not
> exposed
> at each habitat (5x3x3) = 30 trees. From each tree three samples were
> taken (i.e. n=3 pseudoreplicates). Considering the repetition in the two
> years there are n=6 pseudoreplicates, right? And total n = 6 x 30 = 180
> Summary: 10 Trees at three habitats either exposed or not exposed to
> blue
> tits. Each tree was measured three times. The whole experiment was
> repeated two times. Balanced sample design.
>
> The response variable is count data (of larvae and pupae of a moth)
> The explanatory variables are: E1) exposition to blue tits (factor,
> yes/no); E2) the type of habitat (wood, farmland, urban) and E3) the
> year
> of conduction.
>
> The random variables are R1) the Tree (factor, ID 1-30) [and R2) the
> year
> of conduction]
>
> In my opinion, a quite simple study design. Now, I am interested in (all
> the possible ways of) analysis of the following Hypotheses:
> H1 = blue tits reduce the number of larvae on the trees
> H0 = There are no differences in the number of pupae/larvae either
> exposed
> to blue tits or not
> Additionally I am interested in the influence of Habitat type on H1 and
> H0
>
> I learned that the best way to solve problems with repeated measurements
> is to use mixed effects models.
>
> My model:
> lmer(response ~ E1 * E2 + (1|E3) + (1|R1), data)
> and if I’m interested in differences according to the years:
> lmer(response ~ E1 * E2 * E3 + (1|R1), data)
>
> Questions:
> is that right or do i is it better to use two ANOVAs for each consecutive
> year and the means
> for
> the trees, just because everybody can understand it?
> What would be the analysis of choice if the residuals are not normally
> distributed or heteroscedastic? Or: do non-parameteric tests do not need
> to consider random effects?
>
> Kind regards,
> Quentin
>
> _______________________________________________
> R-sig-mixed-models op r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:13}}