[R] multivariate regression and lm()

John Fox jfox at mcmaster.ca
Fri Mar 16 12:42:29 CET 2012


Dear Ernest,

The ML estimator for the mulitvariate linear model assuming multinormal errors is the same as equation-by-equation LS, but multivariate tests performed for all of the responses on the resulting multivariate linear model object (e.g., by anova() or Anova() in the car package) will take the correlations of the responses into account.

I hope that this helps,
 John

------------------------------------------------
John Fox
Sen. William McMaster Prof. of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

On Fri, 16 Mar 2012 07:32:02 -0400
 Ernest Lo <treemake at gmail.com> wrote:
> Hello,
> 
> I would like to perform a multivariate regression analysis to model the
> relationship between m responses Y1, ... Ym and a single set of predictor
> variables X1, ..., Xr.  Each response is assumed to follow its own
> regression model, and the error terms in each model can be correlated.
> 
> Based on my readings of the R help archives and R documentation, the
> function lm() should be able to perform this analysis.  However my tests of
> lm() show that it really just fits m separate regression models (without
> accounting for any possible correlation between the response variables).
> 
> I have attached an example below that demonstrates that the multivariate
> analysis result produced by lm() is identical to running separate
> regression analyses on each of the Y1, ... Ym response variables, which is
> not the desired objective.
> 
> Can anyone confirm if lm() is indeed capable of performing multivariate
> regression analysis, and if so how?
> 
> Thanks very much,
> 
> Ernest
> 
> PS – my post is based on an earlier 2005 post where the same question was
> asked ... the conclusion at that time was that the function lm() is capable
> of handling the multivariate analysis.  However my tests (shown below)
> indicate that lm() actually seems to perform separate regressions for each
> of the response variables without accounting for their possible correlation.
> 
> 
> ### multivariate analysis using lm() ###
> ########################################
> 
> > ex7.8 <- data.frame(z1 = c(0, 1, 2, 3, 4), y1 = c(1, 4, 3, 8, 9), y2 =
> c(-1, -1, 2, 3, 2))
> >
> > f.mlm <- lm(cbind(y1, y2) ~ z1, data = ex7.8)
> > summary(f.mlm)
> Response y1 :
> 
> Call:
> lm(formula = y1 ~ z1, data = ex7.8)
> 
> Residuals:
>  1  2  3  4  5
>  0  1 -2  1  0
> 
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)   1.0000     1.0954   0.913   0.4286
> z1            2.0000     0.4472   4.472   0.0208 *
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> Residual standard error: 1.414 on 3 degrees of freedom
> Multiple R-squared: 0.8696,     Adjusted R-squared: 0.8261
> F-statistic:    20 on 1 and 3 DF,  p-value: 0.02084
> 
> 
> Response y2 :
> 
> Call:
> lm(formula = y2 ~ z1, data = ex7.8)
> 
> Residuals:
>          1          2          3          4          5
> -1.110e-16 -1.000e+00  1.000e+00  1.000e+00 -1.000e+00
> 
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)  -1.0000     0.8944  -1.118   0.3450
> z1            1.0000     0.3651   2.739   0.0714 .
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> Residual standard error: 1.155 on 3 degrees of freedom
> Multiple R-squared: 0.7143,     Adjusted R-squared: 0.619
> F-statistic:   7.5 on 1 and 3 DF,  p-value: 0.07142
> 
> 
> ### separate linear regressions on y1 and y2 ###
> ################################################
> 
> > f.mlm1 = lm(y1 ~ z1, data=ex7.8); summary(f.mlm1)
> 
> Call:
> lm(formula = y1 ~ z1, data = ex7.8)
> 
> Residuals:
>  1  2  3  4  5
>  0  1 -2  1  0
> 
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)   1.0000     1.0954   0.913   0.4286
> z1            2.0000     0.4472   4.472   0.0208 *
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> Residual standard error: 1.414 on 3 degrees of freedom
> Multiple R-squared: 0.8696,     Adjusted R-squared: 0.8261
> F-statistic:    20 on 1 and 3 DF,  p-value: 0.02084
> 
> > f.mlm2 = lm(y2 ~ z1, data=ex7.8); summary(f.mlm2)
> 
> Call:
> lm(formula = y2 ~ z1, data = ex7.8)
> 
> Residuals:
>          1          2          3          4          5
> -1.110e-16 -1.000e+00  1.000e+00  1.000e+00 -1.000e+00
> 
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)  -1.0000     0.8944  -1.118   0.3450
> z1            1.0000     0.3651   2.739   0.0714 .
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> Residual standard error: 1.155 on 3 degrees of freedom
> Multiple R-squared: 0.7143,     Adjusted R-squared: 0.619
> F-statistic:   7.5 on 1 and 3 DF,  p-value: 0.07142
> 
> 	[[alternative HTML version deleted]]
>



More information about the R-help mailing list