[R-sig-eco] Regression with few observations per factor level

Chris Howden chris at trickysolutions.com.au
Thu Oct 23 01:24:02 CEST 2014


A good place to start is by looking at your residuals  to determine if
the normality assumptions are being met, if not then some form of glm
that correctly models the residuals or a non parametric method should
be used.

But just as important though is considering how you intend to use your
data and exactly what it is. Irrelevant to what the statistics says if
you only have 4 datum are you really confident in making broad
generalisations with it? And writing a paper with your name on it?
Just a couple datum could change everything, particularly if the scale
isn't bounded so outliers can have a big impact. If the datum are some
form of average I would be more confident with only 4 of them, but if
they are raw values I would consider being very cautious about any
conclusions you draw.

Another reason I would be cautious of a result using only 4 datum is
that their p value estimates may be very poorly estimated. Although
not widely discussed we often use the Central limit theorem to assume
parameter estimates are normally distributed when calculating the p
value. (Because parameters can be thought of as weighted average the
CLT applies to them). With only 4 datum we can't invoke the magic of
the CLT and since there is no way to test whether the parameters are
normal we take quite a risk assuming we have accurate p values at
small sample sample sizes

Chris Howden
Founding Partner
Tricky Solutions
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP Commercialisation and
Innovation, Data Analysis, Modelling and Training

(mobile) 0410 689 945
(fax / office)
chris at trickysolutions.com.au

Disclaimer: The information in this email and any attachments to it are
confidential and may contain legally privileged information. If you are not
the named or intended recipient, please delete this communication and
contact us immediately. Please note you are not authorised to copy,
use or disclose this communication or any attachments without our
consent. Although this email has been checked by anti-virus software,
there is a risk that email messages may be corrupted or infected by
viruses or other
interferences. No responsibility is accepted for such interference. Unless
expressly stated, the views of the writer are not those of the
company. Tricky Solutions always does our best to provide accurate
forecasts and analyses based on the data supplied, however it is
possible that some important predictors were not included in the data
sent to us. Information provided by us should not be solely relied
upon when making decisions and clients should use their own judgement.

On 22 Oct 2014, at 17:20, V. Coudrain <v_coudrain at voila.fr> wrote:

>> With such a small data set, why not simulate some data sets with > reasonable effect sizes and see how an analysis performs? Krzysztof
>
> Dear Krzysztof,
> It is good idea. Would you know some R functions thatis are well suited for this kind of simulations
>
>
>
> ___________________________________________________________
> Mode, hifi, maison,… J'achète malin. Je compare les prix avec
>    [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



More information about the R-sig-ecology mailing list