[R] regression with categorial variables

Sat Jan 30 17:43:25 CET 2010

kayj wrote:
> Hi All,
> 
> I am working on an example where the electric utility is investigating the
> effect of size of household and the type of air conditioning on electricity
> consumption. I fit a multiple linear regression 
> 
> Electricity consumption=size of the house hold + air conditioning type
> 
> There are 3 air conditioning types so I modeled them as a dummy variable
> Type A
> Type B
> Type C
> 
> Where type A is the reference
> 
> Below are the results
> 
> Electricity consumption= 0.4 size of the household+ 0.95 type B -0.95 type C
> 
> But when I look at the mean of the predicted values of electricity
> consumption by air conditioning type, this is what I get
> 
> Type A  29.86
> Type B  25.94
> Type C  30.1
> 
> I calculated the above means by fitting a linear model as Electricity
> consumption=  size of the household,  without including the air conditioning
> type. Looked at the predicted valued of the response variable and calculated
> the mean of the predicted valued for each category. But you can see that the
> mean response for type B is lower than Type A(25.94 for type B and 29.86 for
> Type A)
> 
> 
> My question is the sign of the Beta’ in the regression model are not
> consistent with the means, for type B the beta is positive 0.95.
> 
> Is this possible? In what circumstances this can happen?

Certainly, this is possible. Your simpler model is a
'coincident straight lines' model, which may not be
at all reasonable. The model including type of a.c. is
a 'parallel straight lines' model. What if type B is
used primarily in small households? Have you plotted
the data?

Here's an example:

x1 <- rep(1:8, c(10,30,70,60,12,12,3,3))
x2 <- factor(rep(LETTERS[c(2,1,3)], c(40,130,30)))
set.seed(1234)
y <- .4*x1 + 1*(x2=='B') - 1*(x2=='C') + .2*rnorm(200)
model1 <- lm(y ~ x1)
model2 <- lm(y ~ x1 + x2)
round(coef(model2), 4)
#(Intercept)          x1         x2B         x2C
#     0.0005      0.4013      0.9143     -0.9977

yp <- predict(model1)
tapply(yp, x2, mean)
#        A        B        C
# 1.431830 1.386754 1.496052

# Have a look at the data with:

plot(y ~ x1, type='n')
points(y ~ x1, subset={x2=='A'})
points(y ~ x1, subset={x2=='B'}, col=4)
points(y ~ x1, subset={x2=='C'}, col=2)
abline(lm(y ~ x1, subset={x2=='A'}))
abline(lm(y ~ x1, subset={x2=='B'}), col=4)
abline(lm(y ~ x1, subset={x2=='C'}), col=2)
abline(model1, lwd=2)

  -Peter Ehlers

> 
> I appreciate your input.
> 

-- 
Peter Ehlers
University of Calgary