[R] "Centered" dummy variables; non zero/one coding

Peter Holck holck at hawaii.edu
Wed Oct 13 10:12:47 CEST 2004


I'm uncertain if this is perhaps a stupid question:

I want to create "centered" dummy variables to use in a call to glm(), and
wondering if there's some slick method in R to do so.  That is, rather than
have a factor, which results in a glm() fit returning coefficients
specifying either absence or presence of the factor, I'd like to fit a glm()
without intercept such that the estimated coefficients (standard errors)
represent the "average" value in my data set for that variable.  

An example: a data set has Race specified with 4 levels.  I can manually
specify 4 dummy variables for a no-intercept model with each variable rather
than having a value of zero or one, has a centered value based on its
frequency of occurrence in the data set.  Thus if 30% of the records in the
data set have Race of Hispanic, I can define a variable HISP that has a
value of either -.3 or .7, resulting in my coefficient estimate for HISP
representing the effect of an "average" person in the database (and a
corresponding valid standard error).   

One way to create these "centered dummy variables" from the original factor
is:
		"B"=scale(RACE=="B",scale=F),
		"W"=scale(RACE=="W",scale=F),
		"H"=scale(RACE=="H",scale=F),
		"OTHRACE"=scale(RACE=="OTHER",scale=F)

However I wonder if there is some method in R to avoid having to manually
define a large number of these dummy variables for a more complicated
dataset.

Thanks in advance,
Peter Holck




More information about the R-help mailing list