[R-sig-ME] A sledgehammer to crack a nut?

Mon Sep 12 15:22:45 CEST 2016

Hello,

I have trouble with the term "repeated measurements" since I started to
use statistics. During my time as a scientist I never saw an experiment
where time-repeated measurements are NOT involved. Normally there are
either before/after measurements, time-rows to investigate a development
of a measurement variable or the repetition of a certain investigation in
consecutive years. Therefore I'm already wondering why most people start
learning basic statistics and repeated measurement is always declared as
the "hard stuff" for self training in the future.

Now I’ve got data from an experiment repeatedly conducted in 2 consecutive
years.

The measurements are from trees, there are five trees exposed/not exposed
at each habitat (5x3x3) = 30 trees. From each tree three samples were
taken (i.e. n=3 pseudoreplicates). Considering the repetition in the two
years there are n=6 pseudoreplicates, right? And total n = 6 x 30 = 180
Summary: 10 Trees at three habitats either exposed or not exposed to blue
tits. Each tree was measured three times. The whole experiment was
repeated two times. Balanced sample design.

The response variable is count data (of larvae and pupae of a moth)
The explanatory variables are: E1) exposition to blue tits (factor,
yes/no); E2) the type of habitat (wood, farmland, urban) and E3) the year
of conduction.

The random variables are R1) the Tree (factor, ID 1-30)  [and R2) the year
of conduction]

In my opinion, a quite simple study design. Now, I am interested in (all
the possible ways of) analysis of the following Hypotheses:
H1 = blue tits reduce the number of larvae on the trees
H0 = There are no differences in the number of pupae/larvae either exposed
to blue tits or not
Additionally I am interested in the influence of Habitat type on H1 and H0

I learned that the best way to solve problems with repeated measurements
is to use mixed effects models.

My model:
lmer(response ~ E1 * E2 + (1|E3) + (1|R1), data)
and if I’m interested in differences according to the years:
lmer(response ~ E1 * E2 * E3 + (1|R1), data)

Questions:
is that right or do i use a sledgehammer to crack a nut?
is it better to use two ANOVAs for each consecutive year and the means for
the trees, just because everybody can understand it?
What would be the analysis of choice if the residuals are not normally
distributed or heteroscedastic? Or: do non-parameteric tests do not need
to consider random effects?

Kind regards,
Quentin