[R] difference between linear model & scatterplot matrix
Francesco Nutini
nutini.francesco at gmail.com
Fri Dec 3 15:42:46 CET 2010
Dear R-users,
I'm studing a DB, structured like this (just a little part of my dataset):
_____________________________________________________________________________________________________________
Site
Latitude
Longitude
Year
Tot-Prod
Total_Density
dmp
Dendoudi-1
15.441964
-13.540179
2005
3271.16
1007
16993.25
Dendoudi-2
15.397321
-13.611607
2005
1616.84
250
25376.67
…
…
…
…
…
…
…
_____________________________________________________________________________________________________________
If I made a scatterplotmatrix with the command show below I obtain a matrix (visible in the image) that show which variables is more correlated with dmp data (violet color).
But, if I made a linear model between the dependent variable (dmp) and many independent variables
I get different information about the significativity of the variable.
I mean, variables that appear correlated with dependent variable in the matrix result not correlated in the summary of linear model, and vice versa. Have I made a mistake in the interpretation of the result, or not?
Thank you in advance,
Francesco
#command for matrix-plot
>dta <-
senegal5[c( 2,4,5,6,7,8,9,13,15,17,21,
39,44,45)]
>dta.r <-
abs(cor(dta))
>dta.col
<- dmat.color(dta.r)
>dta.o <-
order.single(dta.r)
>cpairs(dta,
dta.o, panel.colors=dta.col, gap=.5,
>main="Variables Ordered and Colored by
Correlation")
#command for linear model and summary()
>a<- lm ( dmp ~ Latitude
+ Longitude + Year + Tot.Prod + Herbaceous.Prod.kg.ha. + Leaf.Prod + Tree.bio + Total_Density + X1st.SpecieDensity.trunk.ha.+
X2nd.SpecieDensity.trunk.ha.+ Herb_Specie_Index1 + iNDVI.JASO.
+
RFE.Cum.JASO., data=senegal5 )
>summary(a)
Call:
lm(formula = dmp ~
Latitude + Longitude + Year + Tot.Prod + Herbaceous.Prod.kg.ha. +
Leaf.Prod + Tree.bio + Total_Density +
X1st.SpecieDensity.trunk.ha. +
X2nd.SpecieDensity.trunk.ha. +
Herb_Specie_Index1 + iNDVI.JASO. +
RFE.Cum.JASO.,
data = senegal5)
Residuals:
Min
1Q Median 3Q
Max
-676.49 -195.77 -33.06
113.34 816.17
Coefficients:
Estimate Std. Error
t value Pr(>|t|)
(Intercept) -3.283e+05 4.505e+04
-7.288 4.41e-11 ***
Latitude -6.100e+01 1.990e+02
-0.307 0.7598
Longitude -3.617e+02 8.639e+01
-4.187 5.60e-05 ***
Year 1.604e+02 2.300e+01
6.973 2.15e-10 ***
Tot.Prod -4.893e+00 1.565e+02
-0.031 0.9751
Herbaceous.Prod.kg.ha. 4.905e+00 1.565e+02
0.031 0.9751
Leaf.Prod
4.842e+00 1.565e+02
0.031 0.9754
Tree.bio -4.241e+01 2.771e+02
-0.153 0.8786
Total_Density -1.930e+00 8.933e-01
-2.160 0.0329 *
X1st.SpecieDensity.trunk.ha. 1.992e+00
9.246e-01 2.154
0.0333 *
X2nd.SpecieDensity.trunk.ha. 3.416e+00
1.642e+00 2.080 0.0398 *
Herb_Specie_Index1 -1.091e+00 1.844e+00
-0.592 0.5552
iNDVI.JASO. 8.914e+02 6.076e+01
14.670 < 2e-16 ***
RFE.Cum.JASO. 2.525e+00 4.529e-01
5.575 1.68e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1
Residual standard
error: 295.3 on 114 degrees of freedom
Multiple R-squared:
0.9206, Adjusted R-squared: 0.9116
F-statistic: 101.7 on
13 and 114 DF, p-value: < 2.2e-16
More information about the R-help
mailing list