[R] Regression with very high number of categorical variables

Bert Gunter gunter.berton at gene.com
Tue May 8 16:24:02 CEST 2012


You have received no answer yet. I think this is largely because there
is no simple answer.

1. You don't need to mess with dummy variable. R takes care of this
itself. Please read up on how to do regression in R.

2. However, it may not work anyway: too many variables/categories for
your data. Or it may work but produce nothing useful/sensible.

3. This sort of situation is subject matter area specific. I strongly
recommend you seek local statistical help if you can.

-- Bert

On Tue, May 8, 2012 at 3:29 AM, Michael Haenlein <haenlein at escpeurope.eu> wrote:
> Dear all,
>
> I would like to run a simple regression model y~x1+x2+x3+...
>
> The problem is that I have a lot of independent variables (xi) -- around
> one hundred -- and that some of them are categorical with a lot of
> categories (like, for example, ZIP code). One straightforward way would be
> to (a) transform all categorical variables into 1/0 dummies and (b) enter
> all the variables into an lm model. But I'm not sure whether this is very
> efficient, especially since the analysis is exploratory in nature and I
> expect that many of the xi will have no significant impact on y.
>
> Is there a R library that can handle such a setting? I have read about
> "Hierarchical Bayesian variance components models" that have been used with
> ZIP data (www.jstor.org/stable/10.2307/4129723), but I'm not sure to which
> extent there is a function in R to do that in a straightforward manner.
>
> Thanks,
>
> Michael
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list