[R] Large number of dummy variables
Bert Gunter
gunter.berton at gene.com
Tue Jul 22 00:45:03 CEST 2008
Unless I'm way off base, dummy variable are never needed (nor are desirable)
in R; they should be modelled as factors instead. AN INTRO TO R might, and
certainly V&R's MASS and others will, explain this in more detail.
-- Bert Gunter
Genentech, Inc.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Doran, Harold
Sent: Monday, July 21, 2008 3:16 PM
To: aspearot at ucsc.edu; r-help at r-project.org
Cc: Douglas Bates
Subject: Re: [R] Large number of dummy variables
Well, at the risk of entering a debate I really don't have time for (I'm
doing it anyway) why not consider a random coefficient model? If your
response has anything like, "well, random effects and fixed effects are
correlated and so the estimates are biased but OLS is consistent and
unbiased via an appeal to Gauss-Markov" then I will probably make time
for this discussion :)
I have experienced this problem, though. In what you're doing, you are
first creating the model matrix and then doing the demeaning, correct? I
do recall Doug Bates was, at one point, doing some work where the model
matrix for the fixed effects was immediately created as a sparse matrix
for OLS models. I think doing the work on the sparse matrix is a better
analytical method than time-demeaning. I don't remember where that work
is, though.
There is a package called sparseM which had functions for doing OLS with
sparse matrices. I don't know its status, but vaguely recall the author
of sparseM at one point noting that the work of Bates and Maechler would
be the go to package for work with large, sparse model matrices.
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Alan Spearot
> Sent: Monday, July 21, 2008 5:59 PM
> To: r-help at r-project.org
> Subject: [R] Large number of dummy variables
>
> Hello,
>
> I'm trying to run a regression predicting trade flows between
> importers and exporters. I wish to include both
> year-importer dummies and year-exporter dummies. The former
> includes 1378 levels, and the latter includes 1390 levels. I
> have roughly 100,000 total observations.
>
> When I'm using lm() to run a simple regression, it give me a
> "cannot allocate ___" error. I've been able to get around
> time-demeaning over one large group, but since I have two, it
> doesn't work in the correct way. Is there a more efficient
> way to handling a model matrix this large in R?
>
> Thanks for your help.
>
> Alan Spearot
>
> --
> Alan Spearot
> Assistant Professor - International Economics University of
> California - Santa Cruz
> 1156 High Street
> 453 Engineering 2
> Santa Cruz, CA 95064
> Office: (831) 459-1530
> acspearot at gmail.com
> http://people.ucsc.edu/~aspearot
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list