[R-sig-eco] Regression with few observations per factor level
Martin Weiser
weiser2 at natur.cuni.cz
Mon Oct 20 15:43:40 CEST 2014
Hi,
coefficients and their p-values are reliable if your data are OK and you
do know enough about the process that generated them, so you can choose
appropriate model. With 4 points per line, it may be really difficult to
identify bad fit or outliers.
For example: simple linear regression needs constant variance of the
normal distribution from which residuals are drawn - along the
regression line - to work properly. With 4 points, you can hardly
estimate this, but if you know enough about the process that generated
the data, you are safe. If you do not know, it is not easy to say
anything about the nature of the process that generated the data.
If you know (or can assume) that there is simple linear relationship,
you can say: "slope of this relationship is such and such", but if you
want to estimate both the nature of the relationship ("A *linearly*
depends on B") and its magnitude ("the slope of this relationship
is ..."), p-values would not help you much.
Of course, I may be wrong - I am not a statistician, just a user.
Best,
Martin W.
V. Coudrain píše v Po 20. 10. 2014 v 13:37 +0200:
> Thank you very much. If I get it right, the CI get wider, my test has less power and the probability of getting a significant relation decreases. What about the significant coefficients, are they reliable?
>
>
>
>
> > Message du 20/10/14 à 11h30
> > De : "Roman Luštrik"
> > A : "V. Coudrain"
> > Copie à : "r-sig-ecology at r-project.org"
> > Objet : Re: [R-sig-eco] Regression with few observations per factor level
> >
> > I think you can, but the confidence intervals will be rather large due to number of samples.
> > Notice how standard errors change for sample size (per group) from 4 to 30.
> > > pg <- 4 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), + trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = pg), + cov = runif(pg*4)) # 4 groups> summary(lm(var ~ trt + cov, data = my.df))
> > Call:lm(formula = var ~ trt + cov, data = my.df)
> > Residuals: Min 1Q Median 3Q Max -1.63861 -0.46080 0.03332 0.66380 1.27974
> > Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.2345 1.0218 1.208 0.252 trttrt2 -0.7759 0.8667 -0.895 0.390 trttrt3 7.8503 0.8308 9.449 1.3e-06 ***trttrt4 28.2685 0.9050 31.236 4.3e-12 ***cov 1.4027 1.1639 1.205 0.253 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > Residual standard error: 1.154 on 11 degrees of freedomMultiple R-squared: 0.9932,Adjusted R-squared: 0.9908 F-statistic: 404.4 on 4 and 11 DF, p-value: 7.467e-12
> > > > pg <- 30 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), + trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = pg), + cov = runif(pg*4)) # 4 groups> summary(lm(var ~ trt + cov, data = my.df))
> > Call:lm(formula = var ~ trt + cov, data = my.df)
> > Residuals: Min 1Q Median 3Q Max -2.5778 -0.6584 -0.0185 0.6423 3.2077
> > Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.76961 0.25232 10.977 < 2e-16 ***trttrt2 -1.75490 0.28546 -6.148 1.17e-08 ***trttrt3 8.40521 0.28251 29.752 < 2e-16 ***trttrt4 27.04095 0.28286 95.599 < 2e-16 ***cov 0.05129 0.32523 0.158 0.875 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > Residual standard error: 1.094 on 115 degrees of freedomMultiple R-squared: 0.9913,Adjusted R-squared: 0.991 F-statistic: 3269 on 4 and 115 DF, p-value: < 2.2e-16
> > On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain wrote:
> > Hi, I would like to test the impact of a treatment of some variable using regression (e.g. lm(var ~ trt + cov)). However I only have four observations per factor level. Is it still possible to apply a regression with such a small sample size. I think that i should be difficult to correctly estimate variance.Do you think that I rather should compute a non-parametric test such as Kruskal-Wallis? However I need to include covariables in my models and I am not sure if basic non-parametric tests are suitable for this. Thanks for any suggestion.
> > ___________________________________________________________
> > Mode, hifi, maison,… J'achète malin. Je compare les prix avec
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-ecology mailing list
> > R-sig-ecology at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> >
> >
>
> > --
> > In God we trust, all others bring data.
>
> ___________________________________________________________
> Mode, hifi, maison,… J'achète malin. Je compare les prix avec
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
--
------------------------------
Pokud je tento e-mail součástí obchodního jednání, Přírodovědecká fakulta
Univerzity Karlovy v Praze:
a) si vyhrazuje právo jednání kdykoliv ukončit a to i bez uvedení důvodu,
b) stanovuje, že smlouva musí mít písemnou formu,
c) vylučuje přijetí nabídky s dodatkem či odchylkou,
d) stanovuje, že smlouva je uzavřena teprve výslovným dosažením shody na
všech náležitostech smlouvy.
More information about the R-sig-ecology
mailing list