[R] plm(..., model="within", effect="twoways") is very slow on unablanaced data (was: Re: Regressions with fixed-effect in R)
Liviu Andronic
landronimirc at gmail.com
Mon May 17 17:16:25 CEST 2010
Hello Giovanni
I made a minor modification to your function, which now allows to
compute the within R-sq in Twoways Within models (see below).
However I ran into an issue that I have already encountered before:
whenever I try to fit Twoways Within models on my unbalanced data, the
process is strangely slow and I usually terminate it either after
~15min or when my CPU hits 100C. This is similar to what I mentioned
on the list some time ago [1].
[1] http://www.mail-archive.com/r-help@r-project.org/msg89421.html
Unfortunately I get the same slow behaviour when using
pmodel.response(x, model="within", effect="twoways") to compute teh
within R-sq. Below is an example that approximates the structure of my
data.
### define fun to compute the within R-sq
pmodel.response<-plm:::pmodel.response.plm
plmr2 <-
function(x, adj=TRUE, effect="individual")
{
## fetch response and residuals
y <- pmodel.response(x, model="within", effect=effect)
myres <- resid(x)
n <- length(myres)
if(adj) {
adjssr <- x$df.residual
adjtss <- n-1
} else {
adjssr <- 1
adjtss <- 1
}
ssr <- sum(myres^2)/adjssr
tss <- sum(y^2)/adjtss
return(1-ssr/tss)
}
## prepare the data
set.seed(1)
n <- 2000
T <- 15
x=rnorm(T*n)
x1=runif(T*n)
fe=rep(rnorm(1*n),each=T)
id=rep(1:(1*n),each=T)
ti=rep(1:T,1*n)
e=rnorm(T*n)
y=x+x1+fe+e
subs <- as.logical(rbinom(T*n, 1, .6)); sum(subs)
data <- data.frame(y,x,x1,id,ti)
dim(data)
#[1] 30000 5
pdata <- plm.data(data[subs , ], index = c("id", "ti"))
dim(pdata)
#[1] 18018 5
paste("n=", length(unique(pdata$id))); paste("T=",
length(unique(pdata$ti))); paste("N=", length((pdata$ti)))
#[1] "n= 2000"
#[1] "T= 15"
#[1] "N= 18018"
## individual within model: no issues, 5sec affair
pd_fe1 <- plm(y ~ x + x1, data = pdata,
model = "within", effect="individual")
summary(pd_fe1)
plmr2(pd_fe1, F, "individual")
## twoways within model: never finishes in reasonable time (prepare to
terminate the process)
pd_fe1 <- plm(y ~ x + x1, data = pdata,
model = "within", effect="twoways")
summary(pd_fe1)
plmr2(pd_fe1, F, "twoways") ##neither this on pd_fe1 fitted previously
## individual within with manual time effs: ~10sec
pd_fe2 <- plm(y ~ x + x1 + ti, data = pdata,
model = "within", effect="individual")
summary(pd_fe2)
plmr2(pd_fe2, F, "individual")
#[1] 0.52888
## extract within R-sq from the above fails (prepare to terminate the process)
plmr2(pd_fe2, F, "twoways")
I would have went ahead and computed the within R-sq manually, from
pd_fe2, however the below computes the "individual" within R-sq
RSS2 <- sum(residuals(pd_fe2)^2); RSS2
TSS2 <- plm:::tss.plm(pd_fe2); TSS2
1 - RSS2/TSS2
#[1] 0.52888
while the below yields zero
SSR2 <- sum(fitted(pd_fe2)^2); SSR2
SSR2/TSS2
#[1] 0
because
fitted(pd_fe2)
#NULL
I could also report that plm(..., model="within", effect="twoways") is
quick when the number of individuals is kept low; re-run the same
example with n <- 200.
Any ideas on how to obtain the within R-sq from a Twoways Within fit
on unbalanced panel data with n>2000? Thank you
Liviu
On 5/12/10, Millo Giovanni <Giovanni_Millo at generali.com> wrote:
> Dear Liviu,
>
> we're still working on measures of fit for panels. If I get you right,
> what you mean is the R^2 of the demeaned, or "within", regression. A
> quick and dirty function to do this is:
>
> pmodel.response<-plm:::pmodel.response.plm # needs this to make the
> method accessible
>
> r2<-function(x, adj=TRUE) {
> ## fetch response and residuals
> y <- pmodel.response(x, model="within")
> myres <- resid(x)
> n <- length(myres)
>
> if(adj) {
> adjssr <- x$df.residual
> adjtss <- n-1
> } else {
> adjssr <- 1
> adjtss <- 1
> }
>
> ssr <- sum(myres^2)/adjssr
> tss <- sum(y^2)/adjtss
> return(1-ssr/tss)
> }
>
> and then
>
> > r2(yourmodel)
>
> Hope this helps
> Giovanni
>
> ------------------------------
More information about the R-help
mailing list