[R] Multiple regression (with interactions) by hand

Joshua Wiley jwiley.psych at gmail.com
Wed Sep 4 12:07:05 CEST 2013


Hi Christoph,

ginv() computes the Moore-Penrose generalized inverse by way of a
singular value decomposition.  Part of the calculation involves taking
the reciprocal of the non zero values.  In practice, non zero is
really "within some precision tolerance of zero".  Numerical precision
can bite you in scientific computing.

There are many examples where the most conceptually straightforward
approach is not the best approach because whereas the equation may be
easy to write symbolically, it is more vulnerable to rounding or
truncation errors that occur in floating point representations.

Aside from working through some matrix algebra for understanding,
using established code (like lm) for models where the authors will
have taken issues like numerical precision and stability into
consideration is generally safest.

Cheers,

Josh



On Tue, Sep 3, 2013 at 6:22 AM, Christoph Scherber
<Christoph.Scherber at agr.uni-goettingen.de> wrote:
> Dear all,
>
> But why are there such huge differences betwen solve() and ginv()? (see code below)?
>
> ##
> m1=lm(Ozone~Solar.R*Wind,airquality)
>
> # remove NA´s:
> airquality2=airquality[complete.cases(airquality$Ozone)&
> complete.cases(airquality$Solar.R)&
> complete.cases(airquality$Wind),]
>
> # create the model matrix by hand:
> X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind)
> # is the same as:
> model.matrix(m1)
> # create the response vector by hand:
> Y=airquality2$Ozone
> # is the same as:
> m1$model$Ozone
> # Now solve for the parameter estimates:
>
> solve(crossprod(X)) %*% crossprod(X,Y) #gives the correct answer
>
> library(MASS)
> ginv(t(X)%*%X)%*%t(X)%*%Y #gives a wrong answer
>
>
>
>
>
> Am 03/09/2013 12:29, schrieb Joshua Wiley:
>> Hi Christoph,
>>
>> Use this matrix expression instead:
>>
>> solve(crossprod(X)) %*% t(X) %*% Y
>>
>> Note that:
>>
>> all.equal(crossprod(X), t(X) %*% X)
>>
>> Cheers,
>>
>> Joshua
>>
>>
>>
>> On Tue, Sep 3, 2013 at 2:51 AM, Christoph Scherber
>> <Christoph.Scherber at agr.uni-goettingen.de> wrote:
>>> Dear all,
>>>
>>> I´ve played around with the "airquality" dataset, trying to solve the matrix equations of a simple
>>> multiple regression by hand; however, my matrix multiplications don´t lead to the estimates returned
>>> by coef(). What have I done wrong here?
>>>
>>> ##
>>> m1=lm(Ozone~Solar.R*Wind,airquality)
>>>
>>> # remove NA´s:
>>> airquality2=airquality[complete.cases(airquality$Ozone)&
>>> complete.cases(airquality$Solar.R)&
>>> complete.cases(airquality$Wind),]
>>>
>>> # create the model matrix by hand:
>>> X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind)
>>> # is the same as:
>>> model.matrix(m1)
>>>
>>> # create the response vector by hand:
>>> Y=airquality2$Ozone
>>> # is the same as:
>>> m1$model$Ozone
>>>
>>> # Now solve for the parameter estimates:
>>> library(MASS)
>>> ginv(t(X)%*%X)%*%t(X)%*%Y
>>>
>>> # is not the same as:
>>> coef(m1)
>>>
>>> ##
>>> Now why is my result (line ginv(...)) not the same as the one returned by coef(m1)?
>>>
>>> Thanks very much for your help!
>>>
>>> Best regards,
>>> Christoph
>>>
>>> [using R 3.0.1 on Windows 7 32-Bit]
>>>
>>>
>>>
>>>
>>>
>>> --
>>> PD Dr Christoph Scherber
>>> Georg-August University Goettingen
>>> Department of Crop Science
>>> Agroecology
>>> Grisebachstrasse 6
>>> D-37077 Goettingen
>>> Germany
>>> phone 0049 (0)551 39 8807
>>> fax 0049 (0)551 39 8806
>>> http://www.gwdg.de/~cscherb1
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com



More information about the R-help mailing list