[R-sig-eco] Regression with few observations per factor level

V. Coudrain v_coudrain at voila.fr
Mon Oct 20 13:37:59 CEST 2014


Thank you very much. If I get it right, the CI get wider, my test has less power and the probability of getting a significant relation decreases. What about the significant coefficients, are they reliable?




> Message du 20/10/14 à 11h30
> De : "Roman Luštrik" 
> A : "V. Coudrain" 
> Copie à : "r-sig-ecology at r-project.org" 
> Objet : Re: [R-sig-eco] Regression with few observations per factor level
> 
> I think you can, but the confidence intervals will be rather large due to number of samples.
> Notice how standard errors change for sample size (per group) from 4 to 30.
> > pg <- 4 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +                     trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = pg), +                     cov = runif(pg*4)) # 4 groups> summary(lm(var ~ trt + cov, data = my.df))
> Call:lm(formula = var ~ trt + cov, data = my.df)
> Residuals:     Min       1Q   Median       3Q      Max -1.63861 -0.46080  0.03332  0.66380  1.27974 
> Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept)   1.2345     1.0218   1.208    0.252    trttrt2      -0.7759     0.8667  -0.895    0.390    trttrt3       7.8503     0.8308   9.449  1.3e-06 ***trttrt4      28.2685     0.9050  31.236  4.3e-12 ***cov           1.4027     1.1639   1.205    0.253    ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> Residual standard error: 1.154 on 11 degrees of freedomMultiple R-squared:  0.9932,Adjusted R-squared:  0.9908 F-statistic: 404.4 on 4 and 11 DF,  p-value: 7.467e-12
> > > pg <- 30 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +                     trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = pg), +                     cov = runif(pg*4)) # 4 groups> summary(lm(var ~ trt + cov, data = my.df))
> Call:lm(formula = var ~ trt + cov, data = my.df)
> Residuals:    Min      1Q  Median      3Q     Max -2.5778 -0.6584 -0.0185  0.6423  3.2077 
> Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept)  2.76961    0.25232  10.977  < 2e-16 ***trttrt2     -1.75490    0.28546  -6.148 1.17e-08 ***trttrt3      8.40521    0.28251  29.752  < 2e-16 ***trttrt4     27.04095    0.28286  95.599  < 2e-16 ***cov          0.05129    0.32523   0.158    0.875    ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> Residual standard error: 1.094 on 115 degrees of freedomMultiple R-squared:  0.9913,Adjusted R-squared:  0.991 F-statistic:  3269 on 4 and 115 DF,  p-value: < 2.2e-16
> On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain  wrote:
> Hi, I would like to test the impact of a treatment of some variable using regression (e.g. lm(var ~ trt + cov)).  However I only have four observations per factor level. Is it still possible to apply a regression with such a small sample size. I think that i should be difficult to correctly estimate variance.Do you think that I rather should compute a non-parametric test such as Kruskal-Wallis? However I need to include covariables in my models and I am not sure if basic non-parametric tests are suitable for this. Thanks for any suggestion.
> ___________________________________________________________
> Mode, hifi, maison,… J'achète malin. Je compare les prix avec
>         [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> 
> 

> -- 
> In God we trust, all others bring data. 

___________________________________________________________
Mode, hifi, maison,… J'achète malin. Je compare les prix avec 
	[[alternative HTML version deleted]]



More information about the R-sig-ecology mailing list