# [R] Calculate Adjusted vcov Matrix acc. to Shanken(1992) (Generated Regressor Problem)

Philipp Grueber philipp.grueber at ebs.edu
Sat Nov 30 15:03:38 CET 2013

Dear R Users,

I wish to estimate a regression:
y=a+b x1+c x2+d x3+e
where a is the constant, b,c,d are coefficients and e represents the
residuals. However, I find x1 and x2 to correlate. In order to avoid
multicollinearity, I split up the estimation:

(1) x2~a1+ b1 x1 + e1
(2) y=a2+b2 x1+ c2 e1+ d2 x3+e2

I believe that using the residuals from regression (1) in (2) induces an
„generated regressor problem“ which biases the standard errors obtained (I
am not quite sure because both x1 and the error e1 both are included in my
second regression). (Shanken1992) provides an adjustment which however
refers to a panel setup. Obviously, my setup is unrelated to panels.

Question:
If being applicable to my setup at all, is there an R implementation of
Shanken’s adjustment? See my example below. I try to replicate the
explanation provided in Ch.12 of John Chohcrane's "Asset Pricing" book (page
237ff).

Any help is highly appreciated. Thank you very much in advance!
Best wishes,
Philipp

PS: Note that a similar question was asked by someone else only few days ago
in the stackexchange network. See
http://stats.stackexchange.com/questions/77617/shanken-1992-correction-for-t-statistics

##################################################################
##################################################################
##################################################################

#packages
library(lmtest)

#Data
y<-rnorm(100)
x1<-rnorm(100)
x2<-rnorm(100)
x3<-rnorm(100)

#I wish to estimate y=c+x1+x2+x3+u but cor(x1,x2) is high. Therefore twostep
procedure: (1) x2~c1+x1+u1 (2) y=c2+x1+u1+x3+u2.
res1<-lm(x2~x1)
coeftest(res1,vcov.=vcov(res1))
vcov(res1)

#I am able to calculate this manually.
X1<-as.matrix(data.frame(A=rep(1,100),B=x1),stringsAsFactors=FALSE)
betas<-solve(crossprod(X1,X1))%*%t(X1)%*%x2
round(coef(res1),digits=10)==round(betas,digits=10)
betas
u1<-resid(res1)
n1<-length(y)
k1<-ncol(X1)
vcov1 = 1/(n1-k1) * as.numeric(t(u1)%*%u1) * solve(t(X1)%*%X1) #= 1/T
vcov1

#Now I do my second regression:
Y<-as.matrix(y)
X2<-as.matrix(data.frame(A=1,B=x1,C=u1,D=x3))
res2<-lm(y~x1+u1+x3)
coeftest(res2,vcov.=vcov(res2))
vcov(res2)

#This is the (biased) variance-covariance matrix:
u2<-resid(res2)
n2<-length(y)
k2<-ncol(X2)
vcov2 = 1/(n2-k2) * as.numeric(t(u2)%*%u2) * solve(t(X2)%*%X2) #= 1/T
vcov2

##############
##############
# So far, this was easy, but in order to implement the Shanken(1992)
adjustment, I need to calculate the variance-covariance matrix manually in
the following way: (See Chohcrane (2005) Asset Pricing Ch.12):
var_coef<-function(x){
x_mm<-model.matrix(x)
x_s2<-summary(x)\$sigma^2
solve(t(x_mm)%*%x_mm)%*%t(x_mm)%*%(x_s2*diag(nobs(x)))%*%x_mm%*%solve(t(x_mm)%*%x_mm)
}
vcov_manual<-var_coef(x=res1)
vcov1
round(vcov1,digits=10)==round(vcov_manual,digits=10)

##################################
# This is where I begin to question whether Shanken's adjustment makes sense
in my setup at all.
##################################
#I calculate the covariance matrix of the residuals cov(u,u')   (note: I am
not sure what it actually tells me / whether it makes sense in my setup)
cov_resid<-function(x){
x_mm<-model.matrix(x)
x_s2<-summary(x)\$sigma^2
(diag(nobs(x))-x_mm%*%solve(t(x_mm)%*%x_mm)%*%t(x_mm))%*%(x_s2*diag(nobs(x)))%*%t(diag(nobs(x))-x_mm%*%solve(t(x_mm)%*%x_mm)%*%t(x_mm))
}
cov_resid<-cov_resid(x=res1)
cov_resid[95:100,95:100] #to show only a part

#Another way to go to:
H1<-X1%*%solve(t(X1)%*%X1)%*%t(X1)
((diag(100)-H1)*summary(res1)\$sigma^2)[95:100,95:100]

round(((diag(100)-H1)*summary(res1)\$sigma^2)[95:100,95:100],digits=10)==round(cov_resid[95:100,95:100],digits=10)

##################################
# This is where I really struggle. I do not know how to get cov(x1,x1').
##################################

#The error covariance of the second regression should be
H1<-X1%*%solve(t(X1)%*%X1)%*%t(X1)
sigma<-((diag(100)-H1)*summary(res1)\$sigma^2)
sigma_f<-"???"

#Since the variance-covariance matrix of the second regression should be
vcov2=1/T*((b'b)^(-1)b'Sb(b'b)-sigma_f) I assume:
x=X2
sigma2_f<-(solve(t(x)%*%x)%*%t(x)%*%(sigma)%*%x%*%solve(t(x)%*%x))-vcov2*100
#But then the error covariance cannot be calculated as in Chochrane:
err_cov<-1/100*(coef(res1)%*%sigma2_f%*%coef(res1)+sigma)

##################################
# And then?
##################################

#Given I was able to derive sigma and sigma_f correctly (and that they made
sense), I could then include the Shanken adjustment:
...*(1+t(lambdas)%*%solve(sig_f)%*%lambdas)...
lambdas<-coef(x)
x_mm<-model.matrix(x)
x_s2<-summary(x)\$sigma^2
1/100*(solve(t(x_mm)%*%x_mm)%*%t(x_mm)%*%(sig)%*%x_mm%*%solve(t(x_mm)%*%x_mm)
*(1+t(lambdas)%*%solve(sig_f)%*%lambdas) +sig_f)
}
#...and thus, hope to obtain the corrected variance-covariance matrix for
the second regression

##################################################################
##################################################################
##################################################################

-----
____________________________________
EBS Universitaet fuer Wirtschaft und Recht
FARE Department