[R-sig-eco] Regression with few observations per factor level

Mon Oct 20 15:43:40 CEST 2014

Hi,

coefficients and their p-values are reliable if your data are OK and you
do know enough about the process that generated them, so you can choose
appropriate model. With 4 points per line, it may be really difficult to
identify bad fit or outliers. 

For example: simple linear regression needs constant variance of the
normal distribution from which residuals are drawn -  along the
regression line - to work properly.  With 4 points, you can hardly
estimate this, but if you know enough about the process that generated
the data, you are safe. If you do not know, it is not easy to say
anything about the nature of the process that generated the data.

If you know (or can assume) that there is simple linear relationship,
you can say: "slope of this relationship is such and such", but if you
want to estimate both the nature of the relationship ("A *linearly*
depends on B") and its magnitude ("the slope of this relationship
is ..."), p-values would not help you much.

Of course, I may be wrong - I am not a statistician, just a user.

Best,
Martin W. 

V. Coudrain píše v Po 20. 10. 2014 v 13:37 +0200:
> Thank you very much. If I get it right, the CI get wider, my test has less power and the probability of getting a significant relation decreases. What about the significant coefficients, are they reliable?
> 
> 
> 
> 
> > Message du 20/10/14 à 11h30
> > De : "Roman Luštrik" 
> > A : "V. Coudrain" 
> > Copie à : "r-sig-ecology at r-project.org" 
> > Objet : Re: [R-sig-eco] Regression with few observations per factor level
> > 
> > I think you can, but the confidence intervals will be rather large due to number of samples.
> > Notice how standard errors change for sample size (per group) from 4 to 30.
> > > pg <- 4 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +                     trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = pg), +                     cov = runif(pg*4)) # 4 groups> summary(lm(var ~ trt + cov, data = my.df))
> > Call:lm(formula = var ~ trt + cov, data = my.df)
> > Residuals:     Min       1Q   Median       3Q      Max -1.63861 -0.46080  0.03332  0.66380  1.27974 
> > Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept)   1.2345     1.0218   1.208    0.252    trttrt2      -0.7759     0.8667  -0.895    0.390    trttrt3       7.8503     0.8308   9.449  1.3e-06 ***trttrt4      28.2685     0.9050  31.236  4.3e-12 ***cov           1.4027     1.1639   1.205    0.253    ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > Residual standard error: 1.154 on 11 degrees of freedomMultiple R-squared:  0.9932,Adjusted R-squared:  0.9908 F-statistic: 404.4 on 4 and 11 DF,  p-value: 7.467e-12
> > > > pg <- 30 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +                     trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = pg), +                     cov = runif(pg*4)) # 4 groups> summary(lm(var ~ trt + cov, data = my.df))
> > Call:lm(formula = var ~ trt + cov, data = my.df)
> > Residuals:    Min      1Q  Median      3Q     Max -2.5778 -0.6584 -0.0185  0.6423  3.2077 
> > Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept)  2.76961    0.25232  10.977  < 2e-16 ***trttrt2     -1.75490    0.28546  -6.148 1.17e-08 ***trttrt3      8.40521    0.28251  29.752  < 2e-16 ***trttrt4     27.04095    0.28286  95.599  < 2e-16 ***cov          0.05129    0.32523   0.158    0.875    ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > Residual standard error: 1.094 on 115 degrees of freedomMultiple R-squared:  0.9913,Adjusted R-squared:  0.991 F-statistic:  3269 on 4 and 115 DF,  p-value: < 2.2e-16
> > On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain  wrote:
> > Hi, I would like to test the impact of a treatment of some variable using regression (e.g. lm(var ~ trt + cov)).  However I only have four observations per factor level. Is it still possible to apply a regression with such a small sample size. I think that i should be difficult to correctly estimate variance.Do you think that I rather should compute a non-parametric test such as Kruskal-Wallis? However I need to include covariables in my models and I am not sure if basic non-parametric tests are suitable for this. Thanks for any suggestion.
> > ___________________________________________________________
> > Mode, hifi, maison,… J'achète malin. Je compare les prix avec
> >         [[alternative HTML version deleted]]
> > 
> > _______________________________________________
> > R-sig-ecology mailing list
> > R-sig-ecology at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> > 
> > 
> 
> > -- 
> > In God we trust, all others bring data. 
> 
> ___________________________________________________________
> Mode, hifi, maison,… J'achète malin. Je compare les prix avec 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 

------------------------------
Pokud je tento e-mail součástí obchodního jednání, Přírodovědecká fakulta 
Univerzity Karlovy v Praze:
a) si vyhrazuje právo jednání kdykoliv ukončit a to i bez uvedení důvodu,
b) stanovuje, že smlouva musí mít písemnou formu,
c) vylučuje přijetí nabídky s dodatkem či odchylkou,
d) stanovuje, že smlouva je uzavřena teprve výslovným dosažením shody na 
všech náležitostech smlouvy.