[R-sig-ME] How to correctly specify a mixed model
Ben Bolker
bbolker at gmail.com
Tue Feb 23 02:27:15 CET 2016
You should get exactly the same answer whichever way you do it (try
it!). The only thing you lose by aggregating is the estimate of the
variance among observations within areas (which you might not care about
anyway). The advantage is a simpler model, which is easier to do
inference on and harder to screw up. This is the idea of Murtaugh's 2007
paper in Ecology, "Simplicity and Complexity in Ecological Data
Analysis". The only reasons *not* to aggregate would be:
- you're interested in the within-area variance;
- you're doing a GLMM (count/binary responses can't always be aggregated
as simply as Normal responses)
- you have individual-level covariates that vary within areas
- you have unbalanced data (this can be often be handled by assigning
non-equal weights)
A sample size of 7 is indeed somewhat low for a regression with 2
inputs, but whether you aggregate or not won't make a difference.
On 16-02-22 05:39 PM, christos mammides wrote:
> Dear all,
>
> I have a possibly naïve question on how to correctly specify a mixed
> model. I would appreciate any help you can provide.
>
> Let’s say I have data on plant growth from several individuals from 7
> different areas (n=96), and I want to test the effect of two climatic
> variables (temperature and rain) on growth. For each of the 7 areas I
> have one measurement for temperature and one for rain. For example, the
> first few lines of my data look like this:
>
> Individual Growth Temperature Rain Area
> 1 10 15 300 A
> 2 12 15 300 A
> 3 20 15 300 A
> 4 16 25 500 B
> 5 29 25 500 B
> 6 10 25 500 B
> … … … … …
>
> Would the following model be appropriate (in terms of the way the random
> effect is specified)?
>
> Model <- lmer(Growth~Temperature+Rain+(1|Area), data=Data)
>
> It was suggested to me that since I only have one measurement for each
> climatic variable per area it’s probably better to take the average of
> the plant growth for each area and run a simple regression model such as
> this: Model <- lm(AveragedGrowth~Temp+Rain, data=AveragedData).
>
> I am right to think that in doing that I am losing information, by
> averaging my plant growth data, and I am also reducing my sample size
> (n=7) to a point that it would be too difficult to run a regression?
>
> Hope my question makes sense.
>
> Thank you in advance,
>
> Christos
>
>
>
>
