[R] lm/model.matrix confusion (? bug)

Mark Difford mark_difford at yahoo.co.uk
Wed Dec 12 18:56:27 CET 2007


Hi Gavin,

> ... In this instance, you don't want to be working with lm.  You can use
> lm.fit which returns an object with $coefficients, so I would guess you
> need something like this (not tested) ... & more ...

That's a great help, many thank's for walking me through it; that's the
stuff I need to get me through to the next level.

Best Regards,
Mark.


Gavin Simpson wrote:
> 
> 
> On Wed, 2007-12-12 at 06:46 -0800, Mark Difford wrote:
>> Hi Gavin, Berwin,
>> 
>> Thanks for your detailed replies.  I'll make a general reply, if you
>> don't
>> mind.
>> 
>> To reiterate, my main point is that if model.matrix() can be used in this
>> way, then lm() shouldn't add an intercept.
>> 
>> >> ... lm(Gas ~ model.matrix(~ Insul/Temp - 1), data = whiteside) ....
>> 
>> And the documentation for lm() indicates. albeit indirectly, that
>> model.matrix can be used in this way.  It calls for a formula, or
>> something
>> that can be coerced to one.  And the following meets that criterion:
>> as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside)), and this
>> specifies no intercept.
> 
> This is the problem - it doesn't! 
> 
> What is says is model.matrix() don't add an intercept to the returned
> model matrix. R then tries to work out what to do with the returned
> matrix that is now on the rhs of the formula. But note that at no point
> have you gotten round the fact that it will add an intercept *after* the
> formula is parsed and a -1 is not found. This is because model.matrix is
> called again, internally in lm but not with your formula as an argument
> but with a terms object and nowhere in this object is it specified not
> to include an intercept.
> 
> To do this, lm would have to parse the formula for model.matrix(....)
> and if it finds something work accordingly.
> 
>> 
>> On the question of why I want to mess about in such a labarynthine way. 
>> Well my email was largely expository.  With a straight call to lm(), I
>> wouldn't bother with model.matrix.
>> 
>> So, it really was about getting at lm coefficients inside a function
>> (when
>> you have to get the terms &c. from somewhere else), and trying to
>> understand
>> properly how things work, and why they work the way they do, and even if
>> they should work the way they do.
>> 
>> For instance:---
>> 
>>     if (ols) {
>>         obj <- x[[1]]
>>         mt <- terms(obj)
>>         mf <- model.frame(obj)
>>         y <- model.response(mf)
>>         X <- model.matrix(mt, mf, contrasts = obj$contrasts)
>>         if (attr(mt, "intercept") == 1)                    ## This is my
>> hack to overcome the double-intercept problem
>>             { olscf <- summary(lm(y ~ X))$coefficients }
>>         else {
>>             olscf <- summary(lm(y ~ X - 1))$coefficients
>>         }
>>         rownames(olscf) <- rownames(coef(obj))
> 
> In this instance, you don't want to be working with lm. You can use
> lm.fit which returns an object with $coefficients, so I would guess you
> need something like this (not tested)
> 
>     if (ols) {
>         obj <- x[[1]]
>         mt <- terms(obj)
>         mf <- model.frame(obj)
>         y <- model.response(mf)
>         X <- model.matrix(mt, mf, contrasts = obj$contrasts)
> 	fit <- lm.fit(X, y) ## store fit in case you need something else
> 	olscf <- coef(fit)  ## get coefficients using an extractor function
>         ##if (attr(mt, "intercept") == 1)  ## This is my hack to overcome
> the double-intercept problem
>         ##    { olscf <- summary(lm(y ~ X))$coefficients }
>         ##else {
>         ##    olscf <- summary(lm(y ~ X - 1))$coefficients
>         ##}
>         ##rownames(olscf) <- rownames(coef(obj))
> }
> 
> Note also, that even if you are using your hack, you don't need to do
> 
> summary(lm(y ~ X))$coefficients
> 
> as
> 
> coef(lm(y ~ X))
> 
> will do.
> 
> lm is mainly there for top-level work. If you need to do things like you
> have above, use lm.fit or do the qr decomposition yourself.
> 
> Or try a different way to build the formula you need and work with lm;
> Bill Venables has a nice piece in R News Vol 2 Issue 2 in the
> Programmer's Niche section on something that might be of use if you want
> to build a correct formula for your needs from the colnames of X and y?
> 
> All the best,
> 
> G
> 
>> 
>> Thanks again for your input.
>> 
>> Regards,
>> Mark.
>> 
>> 
>> 
>> 
>> 
>> Berwin A Turlach wrote:
>> > 
>> > G'day Mark,
>> > 
>> > On Wed, 12 Dec 2007 02:05:54 -0800 (PST)
>> > Mark Difford <mark_difford at yahoo.co.uk> wrote:
>> > 
>> >> In order to get the same coefficients as we get from the following
>> > [...] 
>> >> we need to do the following (if we use model.matrix to specify the
>> >> model)
>> > 
>> > By why would you want to do this?
>> > 
>> >> ##
>> >> summary ( lm(Gas ~ model.matrix(~ Insul/Temp - 1) - 1, data =
>> >> whiteside) )
>> >> 
>> >> That is, we need to take out "two intercepts."  Is this "correct"?
>> > 
>> > Yes.
>> >  
>> >> Shouldn't lm check to see if an intercept has been requested as part
>> >> of the model formula?
>> > 
>> > No, it does not.  In the Details section of lm's help page you will
>> > find the following:
>> > 
>> >      A formula has an implied intercept term.  To remove this use
>> >      either 'y ~ x - 1' or 'y ~ 0 + x'.  See 'formula' for more details
>> >      of allowed formulae.
>> > 
>> > Thus, except if you explicitly ask for a constant term not be included,
>> > lm will add a constant term (a column of ones) additionally to what
>> > ever you have specified on the right hand side of the formula.
>> > 
>> >> If I do
>> >> ##
>> >> summary(lm(as.formula(Gas ~ model.matrix (~ Insul/Temp-1,
>> >> data=whiteside)), data=whiteside))
>> >> 
>> >> we get a strange model.  
>> > 
>> > Well, you get a model in which not all parameters are identifiable, and
>> > a particular parameter that is not identifiable is estimated by NA.  I
>> > am not sure what is strange about this.
>> > 
>> >> But the formula part of this model qualifies
>> >> as a valid formula
>> >> ##
>> >> as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside))
>> > 
>> > Debatable, the above command only shows that it can be coerced into a
>> > valid formula. :)
>> > 
>> >> just as if I were to write: lm(Gas ~ Insul/Temp - 1, data=whiteside)
>> >> 
>> >> But we know that the _correct_ formula is the following
>> >  
>> >> ##
>> >> as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside) -1)
>> > 
>> > Why is this formula any more correct than the other one?  Both specify
>> > exactly the same model.  It is just that one does it in an
>> > overparameterised way.
>> > 
>> >> (Sorry, this is getting really long) --- So, my question/confusion
>> >> comes down to wanting to know why lm() doesn't check to see if an
>> >> intercept has been specified when the model has been specified using
>> >> model.matrix.
>> > 
>> > Because lm() is documented not to check this.  If you do not want to
>> > have an intercept in the model you have to specifically ask it for.
>> > 
>> > Also, comparing the output of 
>> > 	summary( lm(Gas ~ Insul/Temp - 1, data = whiteside) )
>> > and
>> > 	summary( lm(Gas ~ Insul/Temp, data = whiteside ) )
>> > 
>> > you can see that lm() does not check whether there is an implicit
>> > intercept in the model.  Compare the (Adjusted) R-squared values
>> > returned; one case is using the formula for models with no intercept
>> > the other one the formula for models with intercept.  Similar story
>> > with the reported F-statistics.  
>> > 
>> > Cheers,
>> > 
>> > 	Berwin
>> > 
>> > =========================== Full address =============================
>> > Berwin A Turlach                            Tel.: +65 6515 4416 (secr)
>> > Dept of Statistics and Applied Probability        +65 6515 6650 (self)
>> > Faculty of Science                          FAX : +65 6872 3919       
>> > National University of Singapore     
>> > 6 Science Drive 2, Blk S16, Level 7          e-mail: statba at nus.edu.sg
>> > Singapore 117546                    http://www.stat.nus.edu.sg/~statba
>> > 
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> > 
>> > 
>> 
> -- 
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/lm-model.matrix-confusion-%28--bug%29-tp14292188p14300313.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list