[R] difference between linear model & scatterplot matrix
Jonathan Christensen
dzhonatan at gmail.com
Fri Dec 3 20:27:26 CET 2010
Francesco,
My guess would be collinearity of the predictors. The linear model
gives you the best fit to all of the predictors at once; unless the
predictors are orthogonal (which in a case like this is certainly not
the case), there is no guarantee that the parameter estimates which
give the best overall fit for the linear model will be similar to
regression coefficients if you were to regress the response on each
predictor individually.
There are various ways to check collinearity, such as variance
inflation factors (VIF). You may want to look into them. It's very
dangerous to try to interpret your parameter estimates in the presence
of collinearity.
Jonathan
On Fri, Dec 3, 2010 at 7:42 AM, Francesco Nutini
<nutini.francesco at gmail.com> wrote:
>
>
>
>
> Dear R-users,
> I'm studing a DB, structured like this (just a little part of my dataset):
> _____________________________________________________________________________________________________________
>
>
>
>
>
>
>
>
>
> Site
> Latitude
> Longitude
> Year
> Tot-Prod
> Total_Density
> dmp
>
>
>
> Dendoudi-1
> 15.441964
> -13.540179
> 2005
> 3271.16
> 1007
> 16993.25
>
>
> Dendoudi-2
> 15.397321
> -13.611607
> 2005
> 1616.84
> 250
> 25376.67
>
>
> …
> …
> …
> …
> …
> …
> …
>
> _____________________________________________________________________________________________________________
>
> If I made a scatterplotmatrix with the command show below I obtain a matrix (visible in the image) that show which variables is more correlated with dmp data (violet color).
> But, if I made a linear model between the dependent variable (dmp) and many independent variables
> I get different information about the significativity of the variable.
> I mean, variables that appear correlated with dependent variable in the matrix result not correlated in the summary of linear model, and vice versa. Have I made a mistake in the interpretation of the result, or not?
>
> Thank you in advance,
> Francesco
>
>
>
> #command for matrix-plot
>
>
>>dta <-
> senegal5[c( 2,4,5,6,7,8,9,13,15,17,21,
> 39,44,45)]
>
>>dta.r <-
> abs(cor(dta))
>
>>dta.col
> <- dmat.color(dta.r)
>
>>dta.o <-
> order.single(dta.r)
>
>>cpairs(dta,
> dta.o, panel.colors=dta.col, gap=.5,
>
>>main="Variables Ordered and Colored by
> Correlation")
> #command for linear model and summary()
>
>
>>a<- lm ( dmp ~ Latitude
> + Longitude + Year + Tot.Prod + Herbaceous.Prod.kg.ha. + Leaf.Prod + Tree.bio + Total_Density + X1st.SpecieDensity.trunk.ha.+
> X2nd.SpecieDensity.trunk.ha.+ Herb_Specie_Index1 + iNDVI.JASO.
> +
> RFE.Cum.JASO., data=senegal5 )
>
>
>
>
>>summary(a)
>
>
>
> Call:
>
> lm(formula = dmp ~
> Latitude + Longitude + Year + Tot.Prod + Herbaceous.Prod.kg.ha. +
>
> Leaf.Prod + Tree.bio + Total_Density +
> X1st.SpecieDensity.trunk.ha. +
>
> X2nd.SpecieDensity.trunk.ha. +
> Herb_Specie_Index1 + iNDVI.JASO. +
>
> RFE.Cum.JASO.,
> data = senegal5)
>
> Residuals:
>
> Min
> 1Q Median 3Q
> Max
>
> -676.49 -195.77 -33.06
> 113.34 816.17
>
>
>
> Coefficients:
>
> Estimate Std. Error
> t value Pr(>|t|)
>
> (Intercept) -3.283e+05 4.505e+04
> -7.288 4.41e-11 ***
>
> Latitude -6.100e+01 1.990e+02
> -0.307 0.7598
>
> Longitude -3.617e+02 8.639e+01
> -4.187 5.60e-05 ***
>
> Year 1.604e+02 2.300e+01
> 6.973 2.15e-10 ***
>
> Tot.Prod -4.893e+00 1.565e+02
> -0.031 0.9751
>
> Herbaceous.Prod.kg.ha. 4.905e+00 1.565e+02
> 0.031 0.9751
>
> Leaf.Prod
> 4.842e+00 1.565e+02
> 0.031 0.9754
>
> Tree.bio -4.241e+01 2.771e+02
> -0.153 0.8786
>
> Total_Density -1.930e+00 8.933e-01
> -2.160 0.0329 *
>
> X1st.SpecieDensity.trunk.ha. 1.992e+00
> 9.246e-01 2.154
> 0.0333 *
>
> X2nd.SpecieDensity.trunk.ha. 3.416e+00
> 1.642e+00 2.080 0.0398 *
>
>
> Herb_Specie_Index1 -1.091e+00 1.844e+00
> -0.592 0.5552
>
> iNDVI.JASO. 8.914e+02 6.076e+01
> 14.670 < 2e-16 ***
>
> RFE.Cum.JASO. 2.525e+00 4.529e-01
> 5.575 1.68e-07 ***
>
> ---
>
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
> 1
>
>
>
> Residual standard
> error: 295.3 on 114 degrees of freedom
>
> Multiple R-squared:
> 0.9206, Adjusted R-squared: 0.9116
>
> F-statistic: 101.7 on
> 13 and 114 DF, p-value: < 2.2e-16
>
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list