[R] subset of factors in a regression

Tue Jul 2 06:39:59 CEST 2013

Philip A. Viton <viton.1 <at> osu.edu> writes:

> suppose "state" is a variable in a dataframe containing abbreviations 
> of the US states, as a factor. What I'd like to do is to include 
> dummy variables for a few of the states, (say, CA and MA) among the 
> independent variables in my regression formula. (This would be the 
> equivalent of, creating, eg, ca<-state=="CA") and then including 
> that). I know I can create all the necessary dummy variables by using 
> the "outer" function on the factor and then renaming them 
> appropriately; but is there a solution that's more direct, ie that 
> doesn't involve a lot of new variables?
> 
> Thanks!

  You could use model.matrix(~state-1) and select the columns
you want, e.g.

state <- state.abb; m <- model.matrix(~state-1)
m[,colnames(m) %in% c("stateCA","stateMA")]

 -- but this will actually create a bunch of vectors you
want before throwing them away.

more compactly:

m <- sapply(cstates,"==",state)
storage.mode(m) <- "numeric"
## or m[] <- as.numeric(m)