[R] simplify code for dummy coding of factors
John Fox
jfox at mcmaster.ca
Wed Dec 31 00:56:13 CET 2014
Hi Michael,
At first I thought that as.numeric() would do it, but that loses the matrix
structure. Here are two solutions; I think that I prefer the second.
----------- snip --------------------
> (dummy.hair <- outer(haireye.df$Hair,
+ levels(haireye.df$Hair), function(x, y) as.numeric(x == y)))
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 0 0 1 0
[4,] 0 0 0 1
[5,] 1 0 0 0
[6,] 0 1 0 0
[7,] 0 0 1 0
[8,] 0 0 0 1
[9,] 1 0 0 0
[10,] 0 1 0 0
[11,] 0 0 1 0
[12,] 0 0 0 1
[13,] 1 0 0 0
[14,] 0 1 0 0
[15,] 0 0 1 0
[16,] 0 0 0 1
> (dummy.hair <- model.matrix(~ -1 + Hair, data=haireye.df))
HairBlack HairBrown HairRed HairBlond
1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 0 0 1
5 1 0 0 0
6 0 1 0 0
7 0 0 1 0
8 0 0 0 1
9 1 0 0 0
10 0 1 0 0
11 0 0 1 0
12 0 0 0 1
13 1 0 0 0
14 0 1 0 0
15 0 0 1 0
16 0 0 0 1
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$Hair
[1] "contr.treatment"
----------- snip --------------------
I hope this helps,
John
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Michael
> Friendly
> Sent: Tuesday, December 30, 2014 6:05 PM
> To: R-help
> Subject: [R] simplify code for dummy coding of factors
>
> In a manuscript, I have the following code to illustrate dummy coding of
> two factors in a contingency table.
>
> It works, but is surely obscured by the method I used, involving outer()
> to find equalities and 0+outer()
> to convert to numeric. Can someone help simplify this code to be more
> comprehensible and give the
> *same* result? I'd prefer a solution that uses base R.
>
> haireye <- margin.table(HairEyeColor, 1:2)
>
> haireye.df <- as.data.frame(haireye)
> dummy.hair <- 0+outer(haireye.df$Hair, levels(haireye.df$Hair), `==`)
> colnames(dummy.hair) <- paste0('h', 1:4)
> dummy.eye <- 0+outer(haireye.df$Eye, levels(haireye.df$Eye), `==`)
> colnames(dummy.eye) <- paste0('e', 1:4)
>
> haireye.df <- data.frame(haireye.df, dummy.hair, dummy.eye)
> haireye.df
>
> > haireye.df
> Hair Eye Freq h1 h2 h3 h4 e1 e2 e3 e4
> 1 Black Brown 68 1 0 0 0 1 0 0 0
> 2 Brown Brown 119 0 1 0 0 1 0 0 0
> 3 Red Brown 26 0 0 1 0 1 0 0 0
> 4 Blond Brown 7 0 0 0 1 1 0 0 0
> 5 Black Blue 20 1 0 0 0 0 1 0 0
> 6 Brown Blue 84 0 1 0 0 0 1 0 0
> 7 Red Blue 17 0 0 1 0 0 1 0 0
> 8 Blond Blue 94 0 0 0 1 0 1 0 0
> 9 Black Hazel 15 1 0 0 0 0 0 1 0
> 10 Brown Hazel 54 0 1 0 0 0 0 1 0
> 11 Red Hazel 14 0 0 1 0 0 0 1 0
> 12 Blond Hazel 10 0 0 0 1 0 0 1 0
> 13 Black Green 5 1 0 0 0 0 0 0 1
> 14 Brown Green 29 0 1 0 0 0 0 0 1
> 15 Red Green 14 0 0 1 0 0 0 0 1
> 16 Blond Green 16 0 0 0 1 0 0 0 1
> >
>
> --
> Michael Friendly Email: friendly AT yorku DOT ca
> Professor, Psychology Dept. & Chair, Quantitative Methods
> York University Voice: 416 736-2100 x66249 Fax: 416 736-5814
> 4700 Keele Street Web:http://www.datavis.ca
> Toronto, ONT M3J 1P3 CANADA
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list