[R] "Centered" dummy variables; non zero/one coding
Peter Holck
holck at hawaii.edu
Wed Oct 13 10:12:47 CEST 2004
I'm uncertain if this is perhaps a stupid question:
I want to create "centered" dummy variables to use in a call to glm(), and
wondering if there's some slick method in R to do so. That is, rather than
have a factor, which results in a glm() fit returning coefficients
specifying either absence or presence of the factor, I'd like to fit a glm()
without intercept such that the estimated coefficients (standard errors)
represent the "average" value in my data set for that variable.
An example: a data set has Race specified with 4 levels. I can manually
specify 4 dummy variables for a no-intercept model with each variable rather
than having a value of zero or one, has a centered value based on its
frequency of occurrence in the data set. Thus if 30% of the records in the
data set have Race of Hispanic, I can define a variable HISP that has a
value of either -.3 or .7, resulting in my coefficient estimate for HISP
representing the effect of an "average" person in the database (and a
corresponding valid standard error).
One way to create these "centered dummy variables" from the original factor
is:
"B"=scale(RACE=="B",scale=F),
"W"=scale(RACE=="W",scale=F),
"H"=scale(RACE=="H",scale=F),
"OTHRACE"=scale(RACE=="OTHER",scale=F)
However I wonder if there is some method in R to avoid having to manually
define a large number of these dummy variables for a more complicated
dataset.
Thanks in advance,
Peter Holck
More information about the R-help
mailing list