[R-sig-eco] Multiple regression

Gavin Simpson gavin.simpson at ucl.ac.uk
Tue Jun 1 10:52:43 CEST 2010


On Mon, 2010-05-31 at 13:44 +1000, Zhongkui Luo wrote:
> Dear all,
> 
> # I have the following sub-datasets of soil carbon change at four sites.
> There are four treatments at each site.
> 
> DltC <- c(-19.237, -14.857, -14.818, -14.815, -11.014, 3.349, 4.332, 3.956,
> -7.638, 9.469, 14.189, 13.037, -9.809, 5.459, 8.748, 11.511)

What you are attempting makes very little sense, especially considering
the small size of your data set. You can't fit such a complex model
because you don't have enough information to estimate all the
parameters; Hence the NAs, and the statement:

Coefficients: (9 not defined because of singularities)

in the printed output.

As to why the different terms are NA between the two fits, this is
explained in ?model.matrix (linked to from ?lm) --- read the bit about
varying fastest.

I'm not sure what you were hoping to achieve by throwing the kitchen
sink of a model at your data, but in many (most?) cases this is not
advisable. I presume you'll want to simplify the model by dropping
"insignificant" terms (for some definition of "insignificant"). You must
take great care not to get into data dredging. Alternatives to this
model selection process are available (e.g. ridge regression, the lasso)
and should be used if you aren't able to /a priori/ specify a reasonable
model. This note in Ecology Letters makes reference to these techniques:

http://www3.interscience.wiley.com/journal/123356334/abstract

However, I doubt these will help in this particular case; your data set
is likely much too small to evaluate all the terms and interactions you
want to evaluate. Collect some more data and rethink the
processes/effects you are trying to capture with your statistical model,
and then include only those terms.

HTH

G

> 
> # Soil C fractions at the start of the experiment at the four sites are:
> 
> f.BIOM <- c(0.0294, 0.0294, 0.0294, 0.0294, 0.0169, 0.0169, 0.0169,
> 0.0169,0.0172, 0.0172, 0.0172, 0.0172, 0.0208, 0.0208, 0.0208, 0.0208) #
> Four treatments have the same initial soil C fraction
> f.FOM <- c(0.183, 0.183, 0.183, 0.183,0.0223, 0.0223, 0.0223, 0.0223,0.0168,
> 0.0168, 0.0168, 0.0168, 0.00766, 0.00766, 0.00766, 0.00766)
> f.Inert_C <- c(0.197, 0.197, 0.197, 0.197,0.466, 0.466, 0.466,
> 0.4666,0.5336, 0.533, 0.533, 0.533,0.333, 0.333, 0.3333, 0.3333)
> f.HUM <- c(0.589, 0.589, 0.589, 0.589,0.494, 0.494, 0.494, 0.494,0.432,
> 0.432, 0.432, 0.432,0.638, 0.638, 0.638, 0.638)
> 
> # Applying multiple regression model to the data:
> 
> fit1 <- lm(DltC ~f.BIOM*f.FOM*f.Inert_C*f.HUM)
> fit2 <- lm(DltC~f.Inert_C*f.HUM*f.BIOM*f.FOM)  # Just change the order of
> the four variables
> summary(fit1)
> summary(fit2)
> 
> 
> # Coefficients of fit1:
> 
> Coefficients: (9 not defined because of singularities)
>                              Estimate Std. Error t value Pr(>|t|)
> (Intercept)                    206698      54914   3.764  0.00446 **
> f.BIOM                       -4477270    1716046  -2.609  0.02831 *
> f.FOM                        -2227245     794993  -2.802  0.02066 *
> f.Inert_C                     -996522     280627  -3.551  0.00621 **
> f.HUM                         -172647      56211  -3.071  0.01332 *
> f.BIOM:f.FOM                       NA         NA      NA       NA
> f.BIOM:f.Inert_C             46171278   12725547   3.628  0.00550 **
> f.FOM:f.Inert_C              10074999    3442541   2.927  0.01685 *
> f.BIOM:f.HUM                       NA         NA      NA       NA
> f.FOM:f.HUM                        NA         NA      NA       NA
> f.Inert_C:f.HUM                    NA         NA      NA       NA
> f.BIOM:f.FOM:f.Inert_C             NA         NA      NA       NA
> f.BIOM:f.FOM:f.HUM                 NA         NA      NA       NA
> f.BIOM:f.Inert_C:f.HUM             NA         NA      NA       NA
> f.FOM:f.Inert_C:f.HUM              NA         NA      NA       NA
> f.BIOM:f.FOM:f.Inert_C:f.HUM       NA         NA      NA       NA
> 
> # Coefficients of fit2:
> Coefficients: (9 not defined because of singularities)
>                               Estimate Std. Error t value Pr(>|t|)
> (Intercept)                      21052      36457   0.577   0.5778
> f.Inert_C                       -54158     112672  -0.481   0.6422
> f.HUM                          -291540      96625  -3.017   0.0145 *
> f.BIOM                         7364603    4214425   1.747   0.1145
> f.FOM                          -242470     123836  -1.958   0.0819 .
> f.Inert_C:f.HUM                 603519     206217   2.927   0.0168 *
> f.Inert_C:f.BIOM             -13939751   10567698  -1.319   0.2197
> f.HUM:f.BIOM                        NA         NA      NA       NA
> f.Inert_C:f.FOM                     NA         NA      NA       NA
> f.HUM:f.FOM                         NA         NA      NA       NA
> f.BIOM:f.FOM                        NA         NA      NA       NA
> f.Inert_C:f.HUM:f.BIOM              NA         NA      NA       NA
> f.Inert_C:f.HUM:f.FOM               NA         NA      NA       NA
> f.Inert_C:f.BIOM:f.FOM              NA         NA      NA       NA
> f.HUM:f.BIOM:f.FOM                  NA         NA      NA       NA
> f.Inert_C:f.HUM:f.BIOM:f.FOM        NA         NA      NA       NA
> 
> 
> # Comparing with fit1, the coefficients in fit2 is quite another, e.g., the
> effect of the interaction between f.Inert_C and f.HUM is significant in
> fit2, but it is 'NA' in fit 1.
> # Do someone have ideas about the difference between fit1 and fit2, and the
> meaning of the NAs.
> 
> Thanks very much for your time.
> 
> Zachary
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-sig-ecology mailing list