[R] Multiple regression in R - unstandardised coefficients are a different sign to standardised coefficients, is this correct?

JC Matthews J.C.Matthews at bristol.ac.uk
Mon Aug 22 17:37:40 CEST 2011


Hello,

I have a statistical problem that I am using R for, but I am not making 
sense of the results. I am trying to use multiple regression to explore 
which variables (weather conditions) have the greater effect on a local 
atmospheric variable. The data is taken from a database that has 20391 data 
points (Z1).

A simplified version of the data I'm looking at is given below, but I have 
a problem in that there is a disagreement in sign between the regression 
coefficients and the standardised regression coefficients. Intuitively I 
would expect both to be the same sign, but in many of the parameters, they 
are not.

I am aware that there is a strong opinion that using standardised 
correlation coefficients is highly discouraged by some people, but I would 
nevertheless like to see the results. Not least because it has made me 
doubt the non-standardised values of B that R has given me.

The code I have used, and some of the data, is as follows (once the 
database has been imported from SQL, and outliers removed).



Z1sub  <- Z1[, c(2, 5, 7,11, 12, 13, 15, 16)]
colnames(Z1sub) <- c("temp", "hum", "wind", "press", "rain", "s.rad", 
"mean1", "sd1" )

attach(Z1sub)
names(Z1sub)


Model1d <- lm(mean1 ~ hum*wind*rain +  I(hum^2) + I(wind^2) + I(rain^2) )

summary(Model1d)

Call:
lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) +
    I(rain^2))

Residuals:
     Min       1Q   Median       3Q      Max
-1230.64   -63.17    18.51    97.85  1275.73

Coefficients:
                Estimate Std. Error t value Pr(>|t|)
(Intercept)   -9.243e+02  5.689e+01 -16.246  < 2e-16 ***
hum            2.835e+01  1.468e+00  19.312  < 2e-16 ***
wind           1.236e+02  4.832e+00  25.587  < 2e-16 ***
rain          -3.144e+03  7.635e+02  -4.118 3.84e-05 ***
I(hum^2)      -1.953e-01  9.393e-03 -20.793  < 2e-16 ***
I(wind^2)      6.914e-01  2.174e-01   3.181  0.00147 **
I(rain^2)      2.730e+02  3.265e+01   8.362  < 2e-16 ***
hum:wind      -1.782e+00  5.448e-02 -32.706  < 2e-16 ***
hum:rain       2.798e+01  8.410e+00   3.327  0.00088 ***
wind:rain      6.018e+02  2.146e+02   2.805  0.00504 **
hum:wind:rain -6.606e+00  2.401e+00  -2.751  0.00594 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 180.5 on 20337 degrees of freedom
Multiple R-squared: 0.2394,     Adjusted R-squared: 0.239
F-statistic: 640.2 on 10 and 20337 DF,  p-value: < 2.2e-16





To calculate the standardised coefficients, I used the following:

Z1sub.scaled <- data.frame(scale( Z1sub[,c('temp', 'hum', 'wind', 'press', 
'rain', 's.rad', 'mean1', 'sd1' ) ] ) )

attach(Z1sub.scaled)
names(Z1sub.scaled)


Model1d.sc <- lm(mean1 ~ hum*wind*rain +  I(hum^2) + I(wind^2) + I(rain^2) )

summary(Model1d.scaled)

Call:
lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) +
    I(rain^2))

Residuals:
     Min       1Q   Median       3Q      Max
-5.94713 -0.30527  0.08946  0.47287  6.16503

Coefficients:
                Estimate Std. Error t value Pr(>|t|)
(Intercept)    0.0806858  0.0096614   8.351  < 2e-16 ***
hum           -0.4581509  0.0073456 -62.371  < 2e-16 ***
wind          -0.1995316  0.0073767 -27.049  < 2e-16 ***
rain          -0.1806894  0.0158037 -11.433  < 2e-16 ***
I(hum^2)      -0.1120435  0.0053885 -20.793  < 2e-16 ***
I(wind^2)      0.0172870  0.0054346   3.181  0.00147 **
I(rain^2)      0.0040575  0.0004853   8.362  < 2e-16 ***
hum:wind      -0.2188729  0.0066659 -32.835  < 2e-16 ***
hum:rain       0.0267420  0.0146201   1.829  0.06740 .
wind:rain      0.0365615  0.0122335   2.989  0.00281 **
hum:wind:rain -0.0438790  0.0159479  -2.751  0.00594 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8723 on 20337 degrees of freedom
Multiple R-squared: 0.2394,     Adjusted R-squared: 0.239
F-statistic: 640.2 on 10 and 20337 DF,  p-value: < 2.2e-16



So having, for instance for humidity (hum), B = 28.35 +/-  1.468, while 
Beta = -0.4581509 +/- 0.0073456 is concerning. Is this normal, or is there 
an error in my code that has caused this contradiction?

Many thanks,

James.


----------------------
JC Matthews
School of Chemistry
Bristol University



More information about the R-help mailing list