[R] Creating Dummy Variables in R

Achim Zeileis Achim.Zeileis at wu-wien.ac.at
Wed Dec 16 16:19:07 CET 2009


On Wed, 16 Dec 2009, whitaker m. (mw1006) wrote:

> Hi,
> I am trying to create a set of dummy variables to use within a multiple linear regression and am unable to find the codes within the manuals.
>
> For example i have:
> Price     Weight     Clarity
>                             IF      VVS1    VVS2
> 500        8             1         0          0
> 1000      5.2          0         0          1
> 864        3              0        1          0
> 340        2.6          0         0          1
> 90          0.5          1         0          0
> 450        2.3          0         1          0
>
> Where price is dependent upon weight (single value in each observation) and clarity (split into three levels, IF, VVS1, VVS2).
> I am having trouble telling the program that clarity is a set of 3 dummy variables and keep getting error messages, what is the correct way?

You should code the categorical variable "Clarity" as a "factor" so that R 
knows that this is a categorical variable and can deal with it 
appropriately in subsequent computations such as summary() or lm().

Thus, I would recommend to store your data as

dat <- data.frame(
   Price = c(500, 1000, 864, 340, 90, 450),
   Weight = c(8, 5.2, 3, 2.6, 0.5, 2.3),
   Clarity = c("IF", "VVS1", "VVS2")[c(1, 3, 2, 3, 1, 2)])

which yields, e.g.,

R> summary(dat)
      Price            Weight      Clarity
  Min.   :  90.0   Min.   :0.500   IF  :2
  1st Qu.: 367.5   1st Qu.:2.375   VVS1:2
  Median : 475.0   Median :2.800   VVS2:2
  Mean   : 540.7   Mean   :3.600
  3rd Qu.: 773.0   3rd Qu.:4.650
  Max.   :1000.0   Max.   :8.000

and then you can also do

R> lm(Price ~ Weight + Clarity, data = dat)

Call:
lm(formula = Price ~ Weight + Clarity, data = dat)

Coefficients:
(Intercept)       Weight  ClarityVVS1  ClarityVVS2
      -45.05        80.01       490.02       403.00

or if you wish to choose a different coding

R> lm(Price ~ 0 + Weight + Clarity, data = dat)

Call:
lm(formula = Price ~ 0 + Weight + Clarity, data = dat)

Coefficients:
      Weight    ClarityIF  ClarityVVS1  ClarityVVS2
       80.01       -45.05       444.97       357.95


Some further reading of introductory material on linear regression in R 
would be useful. Also look at ?lm, ?factor, ?model.matrix, ?contrasts etc.

hth,
Z

> Any helps is greatly appreciated.
> Matthew
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>




More information about the R-help mailing list