[R-sig-eco] Regression with few observations per factor level

Gavin Simpson ucfagls at gmail.com
Thu Oct 23 17:17:20 CEST 2014


On 22 October 2014 17:24, Chris Howden <chris at trickysolutions.com.au> wrote:

> A good place to start is by looking at your residuals  to determine if
> the normality assumptions are being met, if not then some form of glm
> that correctly models the residuals or a non parametric method should
> be used.
>

Doing that could be very tricky indeed; I defy anyone, without knowledge of
how the data were generated, to detect departures from normality in such a
small data set. Try qqnorm(rnorm(4)) a few times and you'll see what I mean.

Second, one usually considers the distribution of the response when fitting
a GLM, not decide if residuals from an LM are non-Gaussian then move on.
The decision to use the GLM should be motivated directly from the data and
question to hand. Perhaps sometimes we can get away with fitting the LM,
but that usually involves some thought, in which case one has probably
already thought about the GLM as well.

G


>
> But just as important though is considering how you intend to use your
> data and exactly what it is. Irrelevant to what the statistics says if
> you only have 4 datum are you really confident in making broad
> generalisations with it? And writing a paper with your name on it?
> Just a couple datum could change everything, particularly if the scale
> isn't bounded so outliers can have a big impact. If the datum are some
> form of average I would be more confident with only 4 of them, but if
> they are raw values I would consider being very cautious about any
> conclusions you draw.
>
> Another reason I would be cautious of a result using only 4 datum is
> that their p value estimates may be very poorly estimated. Although
> not widely discussed we often use the Central limit theorem to assume
> parameter estimates are normally distributed when calculating the p
> value. (Because parameters can be thought of as weighted average the
> CLT applies to them). With only 4 datum we can't invoke the magic of
> the CLT and since there is no way to test whether the parameters are
> normal we take quite a risk assuming we have accurate p values at
> small sample sample sizes
>
> Chris Howden
> Founding Partner
> Tricky Solutions
> Tricky Solutions 4 Tricky Problems
> Evidence Based Strategic Development, IP Commercialisation and
> Innovation, Data Analysis, Modelling and Training
>
> (mobile) 0410 689 945
> (fax / office)
> chris at trickysolutions.com.au
>
> Disclaimer: The information in this email and any attachments to it are
> confidential and may contain legally privileged information. If you are not
> the named or intended recipient, please delete this communication and
> contact us immediately. Please note you are not authorised to copy,
> use or disclose this communication or any attachments without our
> consent. Although this email has been checked by anti-virus software,
> there is a risk that email messages may be corrupted or infected by
> viruses or other
> interferences. No responsibility is accepted for such interference. Unless
> expressly stated, the views of the writer are not those of the
> company. Tricky Solutions always does our best to provide accurate
> forecasts and analyses based on the data supplied, however it is
> possible that some important predictors were not included in the data
> sent to us. Information provided by us should not be solely relied
> upon when making decisions and clients should use their own judgement.
>
> On 22 Oct 2014, at 17:20, V. Coudrain <v_coudrain at voila.fr> wrote:
>
> >> With such a small data set, why not simulate some data sets with >
> reasonable effect sizes and see how an analysis performs? Krzysztof
> >
> > Dear Krzysztof,
> > It is good idea. Would you know some R functions thatis are well suited
> for this kind of simulations
> >
> >
> >
> > ___________________________________________________________
> > Mode, hifi, maison,… J'achète malin. Je compare les prix avec
> >    [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-ecology mailing list
> > R-sig-ecology at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>



-- 
Gavin Simpson, PhD

	[[alternative HTML version deleted]]



More information about the R-sig-ecology mailing list