[R-sig-eco] Regression with few observations per factor level

stephen sefick ssefick at gmail.com
Mon Oct 20 16:50:47 CEST 2014


You are more or less preforming an ANOVA/ANCOVA on your data? As pointed
out earlier, all of the normal theory regression assumptions apply.
Assuming all of those things are satisfied then if you have large
confidence intervals and there are significant differences between groups I
don't see why you couldn't correctly infer something about the treatments.
Maybe I am missing something.

Stephen

On Mon, Oct 20, 2014 at 8:43 AM, Martin Weiser <weiser2 at natur.cuni.cz>
wrote:

> Hi,
>
> coefficients and their p-values are reliable if your data are OK and you
> do know enough about the process that generated them, so you can choose
> appropriate model. With 4 points per line, it may be really difficult to
> identify bad fit or outliers.
>
> For example: simple linear regression needs constant variance of the
> normal distribution from which residuals are drawn -  along the
> regression line - to work properly.  With 4 points, you can hardly
> estimate this, but if you know enough about the process that generated
> the data, you are safe. If you do not know, it is not easy to say
> anything about the nature of the process that generated the data.
>
> If you know (or can assume) that there is simple linear relationship,
> you can say: "slope of this relationship is such and such", but if you
> want to estimate both the nature of the relationship ("A *linearly*
> depends on B") and its magnitude ("the slope of this relationship
> is ..."), p-values would not help you much.
>
> Of course, I may be wrong - I am not a statistician, just a user.
>
> Best,
> Martin W.
>
>
> V. Coudrain píše v Po 20. 10. 2014 v 13:37 +0200:
> > Thank you very much. If I get it right, the CI get wider, my test has
> less power and the probability of getting a significant relation decreases.
> What about the significant coefficients, are they reliable?
> >
> >
> >
> >
> > > Message du 20/10/14 à 11h30
> > > De : "Roman Luštrik"
> > > A : "V. Coudrain"
> > > Copie à : "r-sig-ecology at r-project.org"
> > > Objet : Re: [R-sig-eco] Regression with few observations per factor
> level
> > >
> > > I think you can, but the confidence intervals will be rather large due
> to number of samples.
> > > Notice how standard errors change for sample size (per group) from 4
> to 30.
> > > > pg <- 4 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean
> = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +
>                    trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = pg),
> +                     cov = runif(pg*4)) # 4 groups> summary(lm(var ~ trt +
> cov, data = my.df))
> > > Call:lm(formula = var ~ trt + cov, data = my.df)
> > > Residuals:     Min       1Q   Median       3Q      Max -1.63861
> -0.46080  0.03332  0.66380  1.27974
> > > Coefficients:            Estimate Std. Error t value Pr(>|t|)
> (Intercept)   1.2345     1.0218   1.208    0.252    trttrt2      -0.7759
>  0.8667  -0.895    0.390    trttrt3       7.8503     0.8308   9.449
> 1.3e-06 ***trttrt4      28.2685     0.9050  31.236  4.3e-12 ***cov
>  1.4027     1.1639   1.205    0.253    ---Signif. codes:  0 ‘***’ 0.001
> ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > > Residual standard error: 1.154 on 11 degrees of freedomMultiple
> R-squared:  0.9932,Adjusted R-squared:  0.9908 F-statistic: 404.4 on 4 and
> 11 DF,  p-value: 7.467e-12
> > > > > pg <- 30 # pg = per group> my.df <- data.frame(var = c(rnorm(pg,
> mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean =
> 30)), +                     trt = rep(c("trt1", "trt2", "trt3", "trt4"),
> each = pg), +                     cov = runif(pg*4)) # 4 groups>
> summary(lm(var ~ trt + cov, data = my.df))
> > > Call:lm(formula = var ~ trt + cov, data = my.df)
> > > Residuals:    Min      1Q  Median      3Q     Max -2.5778 -0.6584
> -0.0185  0.6423  3.2077
> > > Coefficients:            Estimate Std. Error t value Pr(>|t|)
> (Intercept)  2.76961    0.25232  10.977  < 2e-16 ***trttrt2     -1.75490
> 0.28546  -6.148 1.17e-08 ***trttrt3      8.40521    0.28251  29.752  <
> 2e-16 ***trttrt4     27.04095    0.28286  95.599  < 2e-16 ***cov
> 0.05129    0.32523   0.158    0.875    ---Signif. codes:  0 ‘***’ 0.001
> ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > > Residual standard error: 1.094 on 115 degrees of freedomMultiple
> R-squared:  0.9913,Adjusted R-squared:  0.991 F-statistic:  3269 on 4 and
> 115 DF,  p-value: < 2.2e-16
> > > On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain  wrote:
> > > Hi, I would like to test the impact of a treatment of some variable
> using regression (e.g. lm(var ~ trt + cov)).  However I only have four
> observations per factor level. Is it still possible to apply a regression
> with such a small sample size. I think that i should be difficult to
> correctly estimate variance.Do you think that I rather should compute a
> non-parametric test such as Kruskal-Wallis? However I need to include
> covariables in my models and I am not sure if basic non-parametric tests
> are suitable for this. Thanks for any suggestion.
> > > ___________________________________________________________
> > > Mode, hifi, maison,… J'achète malin. Je compare les prix avec
> > >         [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > R-sig-ecology mailing list
> > > R-sig-ecology at r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> > >
> > >
> >
> > > --
> > > In God we trust, all others bring data.
> >
> > ___________________________________________________________
> > Mode, hifi, maison,… J'achète malin. Je compare les prix avec
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-ecology mailing list
> > R-sig-ecology at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>
>
>
> --
>
> ------------------------------
> Pokud je tento e-mail součástí obchodního jednání, Přírodovědecká fakulta
> Univerzity Karlovy v Praze:
> a) si vyhrazuje právo jednání kdykoliv ukončit a to i bez uvedení důvodu,
> b) stanovuje, že smlouva musí mít písemnou formu,
> c) vylučuje přijetí nabídky s dodatkem či odchylkou,
> d) stanovuje, že smlouva je uzavřena teprve výslovným dosažením shody na
> všech náležitostech smlouvy.
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>



-- 
Stephen Sefick
**************************************************
Auburn University
Biological Sciences
331 Funchess Hall
Auburn, Alabama
36849
**************************************************
sas0025 at auburn.edu
http://www.auburn.edu/~sas0025
**************************************************

Let's not spend our time and resources thinking about things that are so
little or so large that all they really do for us is puff us up and make us
feel like gods.  We are mammals, and have not exhausted the annoying little
problems of being mammals.

                                -K. Mullis

"A big computer, a complex algorithm and a long time does not equal
science."

                              -Robert Gentleman

	[[alternative HTML version deleted]]



More information about the R-sig-ecology mailing list