[R] Modifying the internals of mgcv's G object to program a within estimator
Andrew Crane-Droesch
andrewcd at gmail.com
Thu Aug 15 19:03:56 CEST 2013
I'm trying to write a function to make gam (in mgcv) give me the
fixed-effects "within" estimator, which is equivalent to the OLS dummy
variable estimator. Basically this involves subtracting the
within-group means from the IVs and the DV, adding their overall means
back in, and fitting the model. For OLS, this is equivalent to running
a regression with a dummy variable for each group. With many groups,
this gets computationally prohibitive, especially if you want to
bootstrap. Eventually I want to apply this procedure to all of gam's
splined terms, but for now I'm simply trying to modify gam's G object to
do so only for one parametrically-represented variable.
The G object feeds everything to gam needed to fit the model. The code
below modifies the design matrix, the DV, and the formula. My
question: what else needs to change in G to make the FE estimator
equivalent to the dummy variable estimator? Unfortunately I can't find
any technical information or well-commented code describing how gam
works in this regard, nor good descriptions of what the constituents of
G are. Nor can I look at the gam.setup function -- I think that it is
compiled in C (or something). Responses that can make the estimate of
the coefficient on x (below) equal for the two models would be really
welcome, as would pointers to useful documentation.
rm(list=ls())
library(mgcv)
library(plyr)
x = rexp(100)
ID = ceiling(runif(100)*10)
y = exp(x)+rnorm(100)+ID
t = ceiling(runif(100)*5)
data = data.frame(y,x,ID,t)
data = data[with(data,order(ID,t)),]
m = gam(y~x+as.factor(ID),fit=FALSE)
transform = function(G,ID){
rep.row = function(x,n){matrix(rep(x,each=n),nrow=n)}
X = G$X[,grepl(colnames(data.frame(ID=ID)),colnames(G$X))==FALSE]
dmframe = as.data.frame(cbind(ID,X))
dm <- as.matrix(ddply(dmframe, "ID", colwise(scale, center = TRUE,
scale = FALSE)))
cm = rep.row(colMeans(dmframe[,2:ncol(dmframe)]),nrow(dmframe))
X = dm[,2:ncol(dm)]+cm
G$X = X
y = G$y
dmframe = as.data.frame(cbind(ID,y))
dm <- ddply(dmframe, "ID", colwise(scale, center = TRUE, scale =
FALSE))
dm = as.matrix(dm)
cm = mean(dmframe[,2])
y = dm[,2]+cm
G$y = y
spec = as.character(G$formula)[3]; n=nchar(spec)
G$formula =
as.formula(paste(as.character(G$formula)[2],as.character(G$formula)[1],substr(spec,start=1,stop=n-15)))
}
#m = gam(y~s(x,k=7)+as.factor(ID),fit=FALSE,sp=0)
out = transform(m,data$ID)
G=m; ID=data$ID
mfe = gam(out)
summary(mfe)
mlsdv = gam(y~x+as.factor(ID))
#mlsdv = gam(y~s(x,k=7)+as.factor(ID),sp=0)
summary(mlsdv)
#The coefficient on x should be equal for the two models. The intercept
should represent the average of the fixed effects.
cbind(as.matrix(t(mlsdv$coef))[,grepl(colnames(data.frame(ID=ID)),colnames(as.matrix(t(mlsdv$coef))))==FALSE],
mfe$coef)
More information about the R-help
mailing list