[R] Multiple regression in R - unstandardised coefficients are a different sign to standardised coefficients, is this correct?
JC Matthews
J.C.Matthews at bristol.ac.uk
Mon Aug 22 17:37:40 CEST 2011
Hello,
I have a statistical problem that I am using R for, but I am not making
sense of the results. I am trying to use multiple regression to explore
which variables (weather conditions) have the greater effect on a local
atmospheric variable. The data is taken from a database that has 20391 data
points (Z1).
A simplified version of the data I'm looking at is given below, but I have
a problem in that there is a disagreement in sign between the regression
coefficients and the standardised regression coefficients. Intuitively I
would expect both to be the same sign, but in many of the parameters, they
are not.
I am aware that there is a strong opinion that using standardised
correlation coefficients is highly discouraged by some people, but I would
nevertheless like to see the results. Not least because it has made me
doubt the non-standardised values of B that R has given me.
The code I have used, and some of the data, is as follows (once the
database has been imported from SQL, and outliers removed).
Z1sub <- Z1[, c(2, 5, 7,11, 12, 13, 15, 16)]
colnames(Z1sub) <- c("temp", "hum", "wind", "press", "rain", "s.rad",
"mean1", "sd1" )
attach(Z1sub)
names(Z1sub)
Model1d <- lm(mean1 ~ hum*wind*rain + I(hum^2) + I(wind^2) + I(rain^2) )
summary(Model1d)
Call:
lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) +
I(rain^2))
Residuals:
Min 1Q Median 3Q Max
-1230.64 -63.17 18.51 97.85 1275.73
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.243e+02 5.689e+01 -16.246 < 2e-16 ***
hum 2.835e+01 1.468e+00 19.312 < 2e-16 ***
wind 1.236e+02 4.832e+00 25.587 < 2e-16 ***
rain -3.144e+03 7.635e+02 -4.118 3.84e-05 ***
I(hum^2) -1.953e-01 9.393e-03 -20.793 < 2e-16 ***
I(wind^2) 6.914e-01 2.174e-01 3.181 0.00147 **
I(rain^2) 2.730e+02 3.265e+01 8.362 < 2e-16 ***
hum:wind -1.782e+00 5.448e-02 -32.706 < 2e-16 ***
hum:rain 2.798e+01 8.410e+00 3.327 0.00088 ***
wind:rain 6.018e+02 2.146e+02 2.805 0.00504 **
hum:wind:rain -6.606e+00 2.401e+00 -2.751 0.00594 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 180.5 on 20337 degrees of freedom
Multiple R-squared: 0.2394, Adjusted R-squared: 0.239
F-statistic: 640.2 on 10 and 20337 DF, p-value: < 2.2e-16
To calculate the standardised coefficients, I used the following:
Z1sub.scaled <- data.frame(scale( Z1sub[,c('temp', 'hum', 'wind', 'press',
'rain', 's.rad', 'mean1', 'sd1' ) ] ) )
attach(Z1sub.scaled)
names(Z1sub.scaled)
Model1d.sc <- lm(mean1 ~ hum*wind*rain + I(hum^2) + I(wind^2) + I(rain^2) )
summary(Model1d.scaled)
Call:
lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) +
I(rain^2))
Residuals:
Min 1Q Median 3Q Max
-5.94713 -0.30527 0.08946 0.47287 6.16503
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0806858 0.0096614 8.351 < 2e-16 ***
hum -0.4581509 0.0073456 -62.371 < 2e-16 ***
wind -0.1995316 0.0073767 -27.049 < 2e-16 ***
rain -0.1806894 0.0158037 -11.433 < 2e-16 ***
I(hum^2) -0.1120435 0.0053885 -20.793 < 2e-16 ***
I(wind^2) 0.0172870 0.0054346 3.181 0.00147 **
I(rain^2) 0.0040575 0.0004853 8.362 < 2e-16 ***
hum:wind -0.2188729 0.0066659 -32.835 < 2e-16 ***
hum:rain 0.0267420 0.0146201 1.829 0.06740 .
wind:rain 0.0365615 0.0122335 2.989 0.00281 **
hum:wind:rain -0.0438790 0.0159479 -2.751 0.00594 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.8723 on 20337 degrees of freedom
Multiple R-squared: 0.2394, Adjusted R-squared: 0.239
F-statistic: 640.2 on 10 and 20337 DF, p-value: < 2.2e-16
So having, for instance for humidity (hum), B = 28.35 +/- 1.468, while
Beta = -0.4581509 +/- 0.0073456 is concerning. Is this normal, or is there
an error in my code that has caused this contradiction?
Many thanks,
James.
----------------------
JC Matthews
School of Chemistry
Bristol University
More information about the R-help
mailing list