[R-sig-eco] Regression with few observations per factor level

Gavin Simpson ucfagls at gmail.com
Thu Oct 23 22:14:52 CEST 2014


I think there are actually 4 data points per level of some factor (after
seeing some of the other no-threaded emails - why can't people use emails
that preserve threads?**); but yes, either way this is a small data set and
trying to decide if residuals are normal or not is going to be nigh on
impossible.

I like the suggestion that someone made to actually do some simulation to
work out whether you have any power to detect an effect of a given size;
seems pointless doing the analysis if you conclusions would be "well, I
didn't detect an effect, but I have no power so I don't even know if I
should have been able to detect an effect if one were present". You'd be in
no worse off a position then than if you hadn't run the analysis or
collected the data.

G

** He says, hoping to heck that GMail preserves the threading information...

On 23 October 2014 14:00, Jari Oksanen <jari.oksanen at oulu.fi> wrote:

>
> On 23/10/2014, at 18:17 PM, Gavin Simpson wrote:
>
> > On 22 October 2014 17:24, Chris Howden <chris at trickysolutions.com.au>
> wrote:
> >
> >> A good place to start is by looking at your residuals  to determine if
> >> the normality assumptions are being met, if not then some form of glm
> >> that correctly models the residuals or a non parametric method should
> >> be used.
> >>
> >
> > Doing that could be very tricky indeed; I defy anyone, without knowledge
> of
> > how the data were generated, to detect departures from normality in such
> a
> > small data set. Try qqnorm(rnorm(4)) a few times and you'll see what I
> mean.
> >
> > Second, one usually considers the distribution of the response when
> fitting
> > a GLM, not decide if residuals from an LM are non-Gaussian then move on.
> > The decision to use the GLM should be motivated directly from the data
> and
> > question to hand. Perhaps sometimes we can get away with fitting the LM,
> > but that usually involves some thought, in which case one has probably
> > already thought about the GLM as well.
>
> I agree completely with Gavin. If you have four data points and fit a
> two-parameter linear model and in addition select a one-parameter
> exponential family distribution (as implied in selecting a GLM family) you
> don't have many degrees of freedom left. I don't think you get such models
> accepted in many journals. Forget the regression and get more data. Some
> people suggested here that an acceptable model could be possible if your
> data points are not single observations but means from several
> observations. That is true: then you can proceed, but consult a
> statistician on the way to proceed.
>
> Cheers, Jari Oksanen
>
>


-- 
Gavin Simpson, PhD

	[[alternative HTML version deleted]]



More information about the R-sig-ecology mailing list