[R-sig-eco] Multiple regression
Gavin Simpson
gavin.simpson at ucl.ac.uk
Tue Jun 1 10:52:43 CEST 2010
On Mon, 2010-05-31 at 13:44 +1000, Zhongkui Luo wrote:
> Dear all,
>
> # I have the following sub-datasets of soil carbon change at four sites.
> There are four treatments at each site.
>
> DltC <- c(-19.237, -14.857, -14.818, -14.815, -11.014, 3.349, 4.332, 3.956,
> -7.638, 9.469, 14.189, 13.037, -9.809, 5.459, 8.748, 11.511)
What you are attempting makes very little sense, especially considering
the small size of your data set. You can't fit such a complex model
because you don't have enough information to estimate all the
parameters; Hence the NAs, and the statement:
Coefficients: (9 not defined because of singularities)
in the printed output.
As to why the different terms are NA between the two fits, this is
explained in ?model.matrix (linked to from ?lm) --- read the bit about
varying fastest.
I'm not sure what you were hoping to achieve by throwing the kitchen
sink of a model at your data, but in many (most?) cases this is not
advisable. I presume you'll want to simplify the model by dropping
"insignificant" terms (for some definition of "insignificant"). You must
take great care not to get into data dredging. Alternatives to this
model selection process are available (e.g. ridge regression, the lasso)
and should be used if you aren't able to /a priori/ specify a reasonable
model. This note in Ecology Letters makes reference to these techniques:
http://www3.interscience.wiley.com/journal/123356334/abstract
However, I doubt these will help in this particular case; your data set
is likely much too small to evaluate all the terms and interactions you
want to evaluate. Collect some more data and rethink the
processes/effects you are trying to capture with your statistical model,
and then include only those terms.
HTH
G
>
> # Soil C fractions at the start of the experiment at the four sites are:
>
> f.BIOM <- c(0.0294, 0.0294, 0.0294, 0.0294, 0.0169, 0.0169, 0.0169,
> 0.0169,0.0172, 0.0172, 0.0172, 0.0172, 0.0208, 0.0208, 0.0208, 0.0208) #
> Four treatments have the same initial soil C fraction
> f.FOM <- c(0.183, 0.183, 0.183, 0.183,0.0223, 0.0223, 0.0223, 0.0223,0.0168,
> 0.0168, 0.0168, 0.0168, 0.00766, 0.00766, 0.00766, 0.00766)
> f.Inert_C <- c(0.197, 0.197, 0.197, 0.197,0.466, 0.466, 0.466,
> 0.4666,0.5336, 0.533, 0.533, 0.533,0.333, 0.333, 0.3333, 0.3333)
> f.HUM <- c(0.589, 0.589, 0.589, 0.589,0.494, 0.494, 0.494, 0.494,0.432,
> 0.432, 0.432, 0.432,0.638, 0.638, 0.638, 0.638)
>
> # Applying multiple regression model to the data:
>
> fit1 <- lm(DltC ~f.BIOM*f.FOM*f.Inert_C*f.HUM)
> fit2 <- lm(DltC~f.Inert_C*f.HUM*f.BIOM*f.FOM) # Just change the order of
> the four variables
> summary(fit1)
> summary(fit2)
>
>
> # Coefficients of fit1:
>
> Coefficients: (9 not defined because of singularities)
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 206698 54914 3.764 0.00446 **
> f.BIOM -4477270 1716046 -2.609 0.02831 *
> f.FOM -2227245 794993 -2.802 0.02066 *
> f.Inert_C -996522 280627 -3.551 0.00621 **
> f.HUM -172647 56211 -3.071 0.01332 *
> f.BIOM:f.FOM NA NA NA NA
> f.BIOM:f.Inert_C 46171278 12725547 3.628 0.00550 **
> f.FOM:f.Inert_C 10074999 3442541 2.927 0.01685 *
> f.BIOM:f.HUM NA NA NA NA
> f.FOM:f.HUM NA NA NA NA
> f.Inert_C:f.HUM NA NA NA NA
> f.BIOM:f.FOM:f.Inert_C NA NA NA NA
> f.BIOM:f.FOM:f.HUM NA NA NA NA
> f.BIOM:f.Inert_C:f.HUM NA NA NA NA
> f.FOM:f.Inert_C:f.HUM NA NA NA NA
> f.BIOM:f.FOM:f.Inert_C:f.HUM NA NA NA NA
>
> # Coefficients of fit2:
> Coefficients: (9 not defined because of singularities)
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 21052 36457 0.577 0.5778
> f.Inert_C -54158 112672 -0.481 0.6422
> f.HUM -291540 96625 -3.017 0.0145 *
> f.BIOM 7364603 4214425 1.747 0.1145
> f.FOM -242470 123836 -1.958 0.0819 .
> f.Inert_C:f.HUM 603519 206217 2.927 0.0168 *
> f.Inert_C:f.BIOM -13939751 10567698 -1.319 0.2197
> f.HUM:f.BIOM NA NA NA NA
> f.Inert_C:f.FOM NA NA NA NA
> f.HUM:f.FOM NA NA NA NA
> f.BIOM:f.FOM NA NA NA NA
> f.Inert_C:f.HUM:f.BIOM NA NA NA NA
> f.Inert_C:f.HUM:f.FOM NA NA NA NA
> f.Inert_C:f.BIOM:f.FOM NA NA NA NA
> f.HUM:f.BIOM:f.FOM NA NA NA NA
> f.Inert_C:f.HUM:f.BIOM:f.FOM NA NA NA NA
>
>
> # Comparing with fit1, the coefficients in fit2 is quite another, e.g., the
> effect of the interaction between f.Inert_C and f.HUM is significant in
> fit2, but it is 'NA' in fit 1.
> # Do someone have ideas about the difference between fit1 and fit2, and the
> meaning of the NAs.
>
> Thanks very much for your time.
>
> Zachary
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-sig-ecology
mailing list