[R] Creating Dummy Variables in R
Achim Zeileis
Achim.Zeileis at wu-wien.ac.at
Wed Dec 16 16:19:07 CET 2009
On Wed, 16 Dec 2009, whitaker m. (mw1006) wrote:
> Hi,
> I am trying to create a set of dummy variables to use within a multiple linear regression and am unable to find the codes within the manuals.
>
> For example i have:
> Price Weight Clarity
> IF VVS1 VVS2
> 500 8 1 0 0
> 1000 5.2 0 0 1
> 864 3 0 1 0
> 340 2.6 0 0 1
> 90 0.5 1 0 0
> 450 2.3 0 1 0
>
> Where price is dependent upon weight (single value in each observation) and clarity (split into three levels, IF, VVS1, VVS2).
> I am having trouble telling the program that clarity is a set of 3 dummy variables and keep getting error messages, what is the correct way?
You should code the categorical variable "Clarity" as a "factor" so that R
knows that this is a categorical variable and can deal with it
appropriately in subsequent computations such as summary() or lm().
Thus, I would recommend to store your data as
dat <- data.frame(
Price = c(500, 1000, 864, 340, 90, 450),
Weight = c(8, 5.2, 3, 2.6, 0.5, 2.3),
Clarity = c("IF", "VVS1", "VVS2")[c(1, 3, 2, 3, 1, 2)])
which yields, e.g.,
R> summary(dat)
Price Weight Clarity
Min. : 90.0 Min. :0.500 IF :2
1st Qu.: 367.5 1st Qu.:2.375 VVS1:2
Median : 475.0 Median :2.800 VVS2:2
Mean : 540.7 Mean :3.600
3rd Qu.: 773.0 3rd Qu.:4.650
Max. :1000.0 Max. :8.000
and then you can also do
R> lm(Price ~ Weight + Clarity, data = dat)
Call:
lm(formula = Price ~ Weight + Clarity, data = dat)
Coefficients:
(Intercept) Weight ClarityVVS1 ClarityVVS2
-45.05 80.01 490.02 403.00
or if you wish to choose a different coding
R> lm(Price ~ 0 + Weight + Clarity, data = dat)
Call:
lm(formula = Price ~ 0 + Weight + Clarity, data = dat)
Coefficients:
Weight ClarityIF ClarityVVS1 ClarityVVS2
80.01 -45.05 444.97 357.95
Some further reading of introductory material on linear regression in R
would be useful. Also look at ?lm, ?factor, ?model.matrix, ?contrasts etc.
hth,
Z
> Any helps is greatly appreciated.
> Matthew
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list