[R-sig-ME] How to correctly specify a mixed model

Ben Bolker bbolker at gmail.com
Tue Feb 23 02:27:15 CET 2016

    You should get exactly the same answer whichever way you do it (try 
it!).  The only thing you lose by aggregating is the estimate of the 
variance among observations within areas (which you might not care about 
anyway).  The advantage is a simpler model, which is easier to do 
inference on and harder to screw up. This is the idea of Murtaugh's 2007 
paper in Ecology, "Simplicity and Complexity in Ecological Data 
Analysis".  The only reasons *not* to aggregate would be:

- you're interested in the within-area variance;
- you're doing a GLMM (count/binary responses can't always be aggregated 
as simply as Normal responses)
- you have individual-level covariates that vary within areas
- you have unbalanced data (this can be often be handled by assigning 
non-equal weights)

   A sample size of 7 is indeed somewhat low for a regression with 2 
inputs, but whether you aggregate or not won't make a difference.

On 16-02-22 05:39 PM, christos mammides wrote:
> Dear all,
> I have a possibly naïve question on how to correctly specify a mixed
> model. I would appreciate any help you can provide.
> Let’s say I have data on plant growth from several individuals from 7
> different areas (n=96), and I want to test the effect of two climatic
> variables (temperature and rain) on growth. For each of the 7 areas I
> have one measurement for temperature and one for rain. For example, the
> first few lines of my data look like this:
> Individual 	Growth 	Temperature 	Rain 	Area
> 1 	10 	15 	300 	A
> 2 	12 	15 	300 	A
> 3 	20 	15 	300 	A
> 4 	16 	25 	500 	B
> 5 	29 	25 	500 	B
> 6 	10 	25 	500 	B
> … 	… 	… 	… 	…
> Would the following model be appropriate (in terms of the way the random
> effect is specified)?
> Model <- lmer(Growth~Temperature+Rain+(1|Area), data=Data)
> It was suggested to me that since I only have one measurement for each
> climatic variable per area it’s probably better to take the average of
> the plant growth for each area and run a simple regression model such as
> this: Model <- lm(AveragedGrowth~Temp+Rain, data=AveragedData).
> I am right to think that in doing that I am losing information, by
> averaging my plant growth data, and I am also reducing my sample size
> (n=7) to a point that it would be too difficult to run a regression?
> Hope my question makes sense.
> Thank you in advance,
> Christos
> 	[[alternative HTML version deleted]]
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

More information about the R-sig-mixed-models mailing list