[R] Sampling from multivariate multiple regression prediction regions
Iain Pardoe
ipardoe at lcbmail.uoregon.edu
Mon May 9 18:43:36 CEST 2005
I'd like to sample multiple response values from a multivariate
regression fit. For example, suppose I have m=2 responses (y1,y2) and a
single set of predictor variables (z1,z2). Each response is assumed to
follow its own regression model, and the error terms in each model can
be correlated (as in example 7.10 of section 7.7 of Johnson/Wichern):
> ex7.10 <-
+ data.frame(y1 = c(141.5, 168.9, 154.8, 146.5, 172.8, 160.1, 108.5),
+ y2 = c(301.8, 396.1, 328.2, 307.4, 362.4, 369.5, 229.1),
+ z1 = c(123.5, 146.1, 133.9, 128.5, 151.5, 136.2, 92),
+ z2 = c(2.108, 9.213, 1.905, .815, 1.061, 8.603, 1.125))
> attach(ex7.10)
> f.mlm <- lm(cbind(y1,y2)~z1+z2)
> y.hat <- c(1, 130, 7.5) %*% coef(f.mlm)
> round(y.hat, 2)
y1 y2
[1,] 151.84 349.63
> qf.z <- t(c(1, 130, 7.5)) %*%
+ solve(t(cbind(1,z1,z2)) %*% cbind(1,z1,z2)) %*%
+ c(1, 130, 7.5)
> round(qf.z, 5)
[,1]
[1,] 0.36995
> n.sigma.hat <- SSD(f.mlm)$SSD # same as t(resid(f.mlm)) %*%
resid(f.mlm)
> round(n.sigma.hat, 2)
y1 y2
y1 5.80 5.22
y2 5.22 12.57
> F.quant <- qf(.95,2,3)
> round(F.quant, 2)
[1] 9.55
This gives me all the information I need to calculate a 95% confidence
ellipse for y=(y1,y2) at (z1,z2)=(130,7.5) using JW's equation (7-48)
(written using R syntax, although R cannot "literally" calculate this as
it is written):
(y-y.hat) %*% ((n-r-1) * solve(n.sigma.hat)) %*% t(y-y.hat)
<= (1+qf.z) * (m*(n-r-1)/(n-r-m)) * F.quant
But, what if instead I'd like to sample (y1,y2) values from this
distribution? I can sample from an F(m,n-r-m) distribution easily
enough, but then how can I transform this to a single point in (y1,y2)
space?
Any ideas would be gratefully appreciated. Thanks.
Iain Pardoe <ipardoe at lcbmail.uoregon.edu>
University of Oregon
More information about the R-help
mailing list