[R] fixed effects with clustered standard errors
Millo Giovanni
Giovanni_Millo at Generali.com
Wed Feb 8 18:53:56 CET 2012
Dear John,
interesting. There must be a bottleneck somewhere, which possibly went
unnoticed because econometricians seldom use so many data points. In
fact 'plm' wasn't designed to handle "only" 700 Megs of data at a time;
but we're happy to investigate in this direction too. E.g., I was aware
of some efficiency problems if effect="twoways" but I seem to understand
that you are using effect="individual"? --> which takes me to the main
point.
I understand that enclosing the data for a reproducible report, as
requested by the posting guide, is awkward for such a big dataset. Yet
it would be of great help if you at least produced:
- an output of your procedure, in order to see what goes wrong and where
- the output of traceback() called immediately after you got the error
(idem)
and possibly gave it a try with lm() applied to the very same formula
and data, maybe into a system.time( ... ) statement.
Else, the information you provide is way too scant to even make an
educated guess. For example, it isn't clear whether the problem is
related to plm() or to vcovHC.plm etc.
As far as "simple demeaning" is concerned, you might try the following
code, which really does only that. Be aware that **standard errors are
biased** etc. etc., this is not meant to be a proper function but just a
computational test for your data and a quick demonstration of demeaning.
'plm()' is far more structured, for a number of reasons. Please execute
it inside system.time() again.
######### test function for within model, BIASED SEs !! #############
##
## ## example:
## data(Produc, package="plm")
## mod <- FEmod(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
index=Produc$state, data=Produc)
## summary(mod)
## ## compare with:
## library(plm)
## example(plm)
demean<-function(x,index,lambda=1,na.rm=F) {
as.vector(x-lambda*tapply(x,index,mean,na.rm=na.rm)[unclass(as.factor(in
dex))])
}
FEmod<-function(formula,index,data=ls()) {
## fit a model without intercept in any case
formula<-as.formula(paste(deparse(formula(formula)),"-1",sep=""))
X<-model.matrix(formula,data=data)
y<-model.response(model.frame(formula,data=data))
## reduce index accordingly
names(index)<-row.names(data)
ind<-index[which(names(index)%in%row.names(X))]
## within transf.
MX<-matrix(NA,ncol=dim(X)[[2]],nrow=dim(X)[[1]])
for(i in 1:dim(X)[[2]]) {
MX[,i]<-demean(X[,i],index=ind,lambda=1)
}
My<-demean(y,index=ind,lambda=1)
## estimate within model
femod<-lm(My~MX-1)
return(femod)
}
####### end test function ########
Best,
Giovanni
########### original message #############
------------------------------
Message: 28
Date: Tue, 07 Feb 2012 15:35:07 +0100
From: cariboupad at gmx.fr
To: r-help at r-project.org
Subject: [R] fixed effects with clustered standard errors
Message-ID: <20120207143507.142490 at gmx.com>
Content-Type: text/plain; charset="utf-8"
Dear R-helpers,
I have a very simple question and I really hope that someone could help
me
I would like to estimate a simple fixed effect regression model with
clustered standard errors by individuals.
For those using Stata, the counterpart would be xtreg with the "fe"
option, or areg with the "absorb" option and in both case the clustering
is achieved with "vce(cluster id)"
My question is : how could I do that with R ? An important point is that
I have too many individuals, therefore I cannot include dummies and
should use the demeaning "usual" procedure.
I tried with the plm package with the "within" option, but R quikcly
tells me that the memory limits are attained (I have over 10go ram!)
while the dataset is only 700mo (about 50 000 individuals, highly
unbalanced)
I dont understand... plm do indeed demean the data so the computation
should be fast and light enough... ?!
Are there any other solutions ?
Many thanks in advance ! ;)
John
############ end original message ############
Giovanni Millo, PhD
Research Dept.,
Assicurazioni Generali SpA
Via Machiavelli 4,
34132 Trieste (Italy)
tel. +39 040 671184
fax +39 040 671160
Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:12}}
More information about the R-help
mailing list