[R-sig-eco] Regression with few observations per factor level

Jari Oksanen jari.oksanen at oulu.fi
Thu Oct 23 22:00:05 CEST 2014


On 23/10/2014, at 18:17 PM, Gavin Simpson wrote:

> On 22 October 2014 17:24, Chris Howden <chris at trickysolutions.com.au> wrote:
> 
>> A good place to start is by looking at your residuals  to determine if
>> the normality assumptions are being met, if not then some form of glm
>> that correctly models the residuals or a non parametric method should
>> be used.
>> 
> 
> Doing that could be very tricky indeed; I defy anyone, without knowledge of
> how the data were generated, to detect departures from normality in such a
> small data set. Try qqnorm(rnorm(4)) a few times and you'll see what I mean.
> 
> Second, one usually considers the distribution of the response when fitting
> a GLM, not decide if residuals from an LM are non-Gaussian then move on.
> The decision to use the GLM should be motivated directly from the data and
> question to hand. Perhaps sometimes we can get away with fitting the LM,
> but that usually involves some thought, in which case one has probably
> already thought about the GLM as well.

I agree completely with Gavin. If you have four data points and fit a two-parameter linear model and in addition select a one-parameter exponential family distribution (as implied in selecting a GLM family) you don't have many degrees of freedom left. I don't think you get such models accepted in many journals. Forget the regression and get more data. Some people suggested here that an acceptable model could be possible if your data points are not single observations but means from several observations. That is true: then you can proceed, but consult a statistician on the way to proceed.

Cheers, Jari Oksanen



More information about the R-sig-ecology mailing list