[R] regression with categorial variables
Peter Ehlers
ehlers at ucalgary.ca
Sat Jan 30 17:43:25 CET 2010
kayj wrote:
> Hi All,
>
> I am working on an example where the electric utility is investigating the
> effect of size of household and the type of air conditioning on electricity
> consumption. I fit a multiple linear regression
>
> Electricity consumption=size of the house hold + air conditioning type
>
> There are 3 air conditioning types so I modeled them as a dummy variable
> Type A
> Type B
> Type C
>
> Where type A is the reference
>
> Below are the results
>
> Electricity consumption= 0.4 size of the household+ 0.95 type B -0.95 type C
>
> But when I look at the mean of the predicted values of electricity
> consumption by air conditioning type, this is what I get
>
> Type A 29.86
> Type B 25.94
> Type C 30.1
>
> I calculated the above means by fitting a linear model as Electricity
> consumption= size of the household, without including the air conditioning
> type. Looked at the predicted valued of the response variable and calculated
> the mean of the predicted valued for each category. But you can see that the
> mean response for type B is lower than Type A(25.94 for type B and 29.86 for
> Type A)
>
>
> My question is the sign of the Beta’ in the regression model are not
> consistent with the means, for type B the beta is positive 0.95.
>
> Is this possible? In what circumstances this can happen?
Certainly, this is possible. Your simpler model is a
'coincident straight lines' model, which may not be
at all reasonable. The model including type of a.c. is
a 'parallel straight lines' model. What if type B is
used primarily in small households? Have you plotted
the data?
Here's an example:
x1 <- rep(1:8, c(10,30,70,60,12,12,3,3))
x2 <- factor(rep(LETTERS[c(2,1,3)], c(40,130,30)))
set.seed(1234)
y <- .4*x1 + 1*(x2=='B') - 1*(x2=='C') + .2*rnorm(200)
model1 <- lm(y ~ x1)
model2 <- lm(y ~ x1 + x2)
round(coef(model2), 4)
#(Intercept) x1 x2B x2C
# 0.0005 0.4013 0.9143 -0.9977
yp <- predict(model1)
tapply(yp, x2, mean)
# A B C
# 1.431830 1.386754 1.496052
# Have a look at the data with:
plot(y ~ x1, type='n')
points(y ~ x1, subset={x2=='A'})
points(y ~ x1, subset={x2=='B'}, col=4)
points(y ~ x1, subset={x2=='C'}, col=2)
abline(lm(y ~ x1, subset={x2=='A'}))
abline(lm(y ~ x1, subset={x2=='B'}), col=4)
abline(lm(y ~ x1, subset={x2=='C'}), col=2)
abline(model1, lwd=2)
-Peter Ehlers
>
> I appreciate your input.
>
--
Peter Ehlers
University of Calgary
More information about the R-help
mailing list