# [R] Matrix algebra in R to compute coefficients of a linear regression.

Frank Harrell f.harrell at vanderbilt.edu
Sat Feb 18 16:20:48 CET 2012

```The Wilcoxon test is not related to the difference in medians but rather to
the Hodges-Lehmann estimator, which is the median of all possible
differences of observations between sample 1 and sample 2.  So what's needed
is a confidence interval for this estimate, obtainable from inverting the
Wilcoxon test.
Frank

John Sorkin wrote
>
> Thank you,
> John
>
> John David Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>>>> Berend Hasselman <bhh@> 2/18/2012 9:05 AM >>>
>
> On 18-02-2012, at 14:36, John Sorkin wrote:
>
>> I am trying to use matrix algebra to get the beta coefficients from a
>> simple bivariate linear regression, y=f(x).
>> The coefficients should be computable using the following matrix algebra:
>> t(X)Y / t(x)X
>>
>> I have pasted the code I wrote below. I clearly odes not work both
>> because it returns a matrix rather than a vector containing two elements
>> the beta for the intercept and the beta for x, and because the values
>> produced by the matrix algebra are not the same as those returned by the
>> linear regression. Can someone tell we where I have gone wrong, either in
>> my use of matrix algebra in R, or perhaps at a more fundamental
>> theoretical level?
>> Thanks,
>> John
>>
>> # Define intercept, x and y.
>> int <- rep(1,100)
>> x   <- 1:100
>> y   <- 2*x + rnorm(100)
>>
>> # Create a matrix to hold values.
>> data           <- matrix(nrow=100,ncol=3)
>> dimnames(data) <- list(NULL,c("int","x","y"))
>> data[,"int"] <- int
>> data[,"x"]   <- x
>> data[,"y"]   <- y
>> data
>>
>> # Compute numerator.
>> num <-  cov(data)
>> num
>>
>> # Compute denominator
>> denom <- solve(t(data) %*% data)
>> denom
>>
>> # Compute betas, [t(X)Y]/[t(X)Y]
>> betaRon <-    num %*% denom
>> betaRon
>>
>> # Get betas from regression so we can check
>> # values obtaned by matrix algebra.
>> fit0 <- lm(y~x)
>>
>
> data should only contain the independent (righthand side) variables.
>
> So
>
> # Create a matrix to hold values.
> data           <- matrix(nrow=100,ncol=2)
> dimnames(data) <- list(NULL,c("int","x"))#,"y"))
> data[,"int"] <- int
> data[,"x"]   <- x
>
> You should get correct results.
> A better way to calculate the beta's is to calculate them directly without
> explicitly calculating an invers
>
> # Better
> # Compute betas, from t(X)X beta = t(Z) y
> # without computing inverse explicitly
> betaAlt <- solve(t(data) %*% data, num)
> betaAlt
>
> Even better is the method lm uses without forming t(data) %*%  data
> explicitly but using a (pivoted) QR decomposition of data.
>
> Berend
>
>
>
>
>
> Confidentiality Statement:
> This email message, including any attachments, is for th...{{dropped:6}}
>
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/CI-for-the-median-difference-tp4399508p4399897.html
Sent from the R help mailing list archive at Nabble.com.

```