[R] applying lm on an array of observations with common design matrix

Ranjan Maitra maitra at iastate.edu
Thu Feb 22 20:21:32 CET 2007


On Thu, 22 Feb 2007 08:17:38 +0000 (GMT) Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:

> On Thu, 22 Feb 2007, Petr Klasterecky wrote:
> 
> > Ranjan Maitra napsal(a):
> >> On Sun, 18 Feb 2007 07:46:56 +0000 (GMT) Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> >>
> >>> On Sat, 17 Feb 2007, Ranjan Maitra wrote:
> >>>
> >>>> Dear list,
> >>>>
> >>>> I have a 4-dimensional array Y of dimension 330 x 67 x 35 x 51. I have a
> >>>> design matrix X of dimension 330 x 4. I want to fit a linear regression
> >>>> of each
> >>>>
> >>>> lm( Y[, i, j, k] ~ X). for each i, j, k.
> >>>>
> >>>> Can I do it in one shot without a loop?
> >>> Yes.
> >>>
> >>> YY <- YY
> >>> dim(YY) <- c(330, 67*35*51)
> >>> fit <- lm(YY ~ X)
> >>>
> >>>> Actually, I am also interested in getting the p-values of some of the
> >>>> coefficients -- lets say the coefficient corresponding to the second
> >>>> column of the design matrix. Can the same be done using array-based
> >>>> operations?
> >>> Use lapply(summary(fit), function(x) coef(x)[3,4])  (since there is a
> >>> intercept, you want the third coefficient).
> >>
> >> In this context, can one also get the variance-covariance matrix of the 
> >> coefficients?
> >
> > Sure:
> >
> > lapply(summary(fit), function(x) {"$"(x,cov.unscaled)})
> 
> But that is not the variance-covariance matrix (and it is an unusual way 
> to write x$cov.unscaled)!
> 
> > Add indexing if you do not want the whole matrix. You can extract
> > whatever you want, just take a look at ?summary.lm, section Value.
> 
> It is unclear to me what the questioner expects: the estimated 
> coefficients for different responses are independent.  For a list of 
> matrices applying to each response one could mimic vcov.lm and do
> 
> lapply(summary(fit, corr=FALSE),
>         function(so) so$sigma^2 * so$cov.unscaled)

Thanks! Actually, I am really looking to compare the coefficients (let us say second and the third) beta2 - beta4 = 0 for each regression. Basically, get the two-sided p-value for the test statistic for each regression. 

One way of doing that is to get the dispersion matrix of each regression and then to compute the t-statistic and the p-value. That is the genesis of the question above. Is there a better way?

Many thanks and best wishes,
Ranjan



More information about the R-help mailing list