[R] subset of factors in a regression
David Winsemius
dwinsemius at comcast.net
Tue Jul 2 16:01:07 CEST 2013
On Jul 1, 2013, at 9:39 PM, Ben Bolker wrote:
> Philip A. Viton <viton.1 <at> osu.edu> writes:
>
>> suppose "state" is a variable in a dataframe containing abbreviations
>> of the US states, as a factor. What I'd like to do is to include
>> dummy variables for a few of the states, (say, CA and MA) among the
>> independent variables in my regression formula. (This would be the
>> equivalent of, creating, eg, ca<-state=="CA") and then including
>> that). I know I can create all the necessary dummy variables by using
>> the "outer" function on the factor and then renaming them
>> appropriately; but is there a solution that's more direct, ie that
>> doesn't involve a lot of new variables?
>>
>> Thanks!
>
> You could use model.matrix(~state-1) and select the columns
> you want, e.g.
>
> state <- state.abb; m <- model.matrix(~state-1)
> m[,colnames(m) %in% c("stateCA","stateMA")]
>
> -- but this will actually create a bunch of vectors you
> want before throwing them away.
>
> more compactly:
>
> m <- sapply(cstates,"==",state)
> storage.mode(m) <- "numeric"
> ## or m[] <- as.numeric(m)
Couldn't this be achieved with "I"?:
lm(Y ~ I(state=="CA") + I(state=="MA") + covariates, data=dfrm)
--
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list