[R] lm/model.matrix confusion (? bug)
Mark Difford
mark_difford at yahoo.co.uk
Wed Dec 12 18:56:27 CET 2007
Hi Gavin,
> ... In this instance, you don't want to be working with lm. You can use
> lm.fit which returns an object with $coefficients, so I would guess you
> need something like this (not tested) ... & more ...
That's a great help, many thank's for walking me through it; that's the
stuff I need to get me through to the next level.
Best Regards,
Mark.
Gavin Simpson wrote:
>
>
> On Wed, 2007-12-12 at 06:46 -0800, Mark Difford wrote:
>> Hi Gavin, Berwin,
>>
>> Thanks for your detailed replies. I'll make a general reply, if you
>> don't
>> mind.
>>
>> To reiterate, my main point is that if model.matrix() can be used in this
>> way, then lm() shouldn't add an intercept.
>>
>> >> ... lm(Gas ~ model.matrix(~ Insul/Temp - 1), data = whiteside) ....
>>
>> And the documentation for lm() indicates. albeit indirectly, that
>> model.matrix can be used in this way. It calls for a formula, or
>> something
>> that can be coerced to one. And the following meets that criterion:
>> as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside)), and this
>> specifies no intercept.
>
> This is the problem - it doesn't!
>
> What is says is model.matrix() don't add an intercept to the returned
> model matrix. R then tries to work out what to do with the returned
> matrix that is now on the rhs of the formula. But note that at no point
> have you gotten round the fact that it will add an intercept *after* the
> formula is parsed and a -1 is not found. This is because model.matrix is
> called again, internally in lm but not with your formula as an argument
> but with a terms object and nowhere in this object is it specified not
> to include an intercept.
>
> To do this, lm would have to parse the formula for model.matrix(....)
> and if it finds something work accordingly.
>
>>
>> On the question of why I want to mess about in such a labarynthine way.
>> Well my email was largely expository. With a straight call to lm(), I
>> wouldn't bother with model.matrix.
>>
>> So, it really was about getting at lm coefficients inside a function
>> (when
>> you have to get the terms &c. from somewhere else), and trying to
>> understand
>> properly how things work, and why they work the way they do, and even if
>> they should work the way they do.
>>
>> For instance:---
>>
>> if (ols) {
>> obj <- x[[1]]
>> mt <- terms(obj)
>> mf <- model.frame(obj)
>> y <- model.response(mf)
>> X <- model.matrix(mt, mf, contrasts = obj$contrasts)
>> if (attr(mt, "intercept") == 1) ## This is my
>> hack to overcome the double-intercept problem
>> { olscf <- summary(lm(y ~ X))$coefficients }
>> else {
>> olscf <- summary(lm(y ~ X - 1))$coefficients
>> }
>> rownames(olscf) <- rownames(coef(obj))
>
> In this instance, you don't want to be working with lm. You can use
> lm.fit which returns an object with $coefficients, so I would guess you
> need something like this (not tested)
>
> if (ols) {
> obj <- x[[1]]
> mt <- terms(obj)
> mf <- model.frame(obj)
> y <- model.response(mf)
> X <- model.matrix(mt, mf, contrasts = obj$contrasts)
> fit <- lm.fit(X, y) ## store fit in case you need something else
> olscf <- coef(fit) ## get coefficients using an extractor function
> ##if (attr(mt, "intercept") == 1) ## This is my hack to overcome
> the double-intercept problem
> ## { olscf <- summary(lm(y ~ X))$coefficients }
> ##else {
> ## olscf <- summary(lm(y ~ X - 1))$coefficients
> ##}
> ##rownames(olscf) <- rownames(coef(obj))
> }
>
> Note also, that even if you are using your hack, you don't need to do
>
> summary(lm(y ~ X))$coefficients
>
> as
>
> coef(lm(y ~ X))
>
> will do.
>
> lm is mainly there for top-level work. If you need to do things like you
> have above, use lm.fit or do the qr decomposition yourself.
>
> Or try a different way to build the formula you need and work with lm;
> Bill Venables has a nice piece in R News Vol 2 Issue 2 in the
> Programmer's Niche section on something that might be of use if you want
> to build a correct formula for your needs from the colnames of X and y?
>
> All the best,
>
> G
>
>>
>> Thanks again for your input.
>>
>> Regards,
>> Mark.
>>
>>
>>
>>
>>
>> Berwin A Turlach wrote:
>> >
>> > G'day Mark,
>> >
>> > On Wed, 12 Dec 2007 02:05:54 -0800 (PST)
>> > Mark Difford <mark_difford at yahoo.co.uk> wrote:
>> >
>> >> In order to get the same coefficients as we get from the following
>> > [...]
>> >> we need to do the following (if we use model.matrix to specify the
>> >> model)
>> >
>> > By why would you want to do this?
>> >
>> >> ##
>> >> summary ( lm(Gas ~ model.matrix(~ Insul/Temp - 1) - 1, data =
>> >> whiteside) )
>> >>
>> >> That is, we need to take out "two intercepts." Is this "correct"?
>> >
>> > Yes.
>> >
>> >> Shouldn't lm check to see if an intercept has been requested as part
>> >> of the model formula?
>> >
>> > No, it does not. In the Details section of lm's help page you will
>> > find the following:
>> >
>> > A formula has an implied intercept term. To remove this use
>> > either 'y ~ x - 1' or 'y ~ 0 + x'. See 'formula' for more details
>> > of allowed formulae.
>> >
>> > Thus, except if you explicitly ask for a constant term not be included,
>> > lm will add a constant term (a column of ones) additionally to what
>> > ever you have specified on the right hand side of the formula.
>> >
>> >> If I do
>> >> ##
>> >> summary(lm(as.formula(Gas ~ model.matrix (~ Insul/Temp-1,
>> >> data=whiteside)), data=whiteside))
>> >>
>> >> we get a strange model.
>> >
>> > Well, you get a model in which not all parameters are identifiable, and
>> > a particular parameter that is not identifiable is estimated by NA. I
>> > am not sure what is strange about this.
>> >
>> >> But the formula part of this model qualifies
>> >> as a valid formula
>> >> ##
>> >> as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside))
>> >
>> > Debatable, the above command only shows that it can be coerced into a
>> > valid formula. :)
>> >
>> >> just as if I were to write: lm(Gas ~ Insul/Temp - 1, data=whiteside)
>> >>
>> >> But we know that the _correct_ formula is the following
>> >
>> >> ##
>> >> as.formula(Gas ~ model.matrix (~ Insul/Temp-1, data=whiteside) -1)
>> >
>> > Why is this formula any more correct than the other one? Both specify
>> > exactly the same model. It is just that one does it in an
>> > overparameterised way.
>> >
>> >> (Sorry, this is getting really long) --- So, my question/confusion
>> >> comes down to wanting to know why lm() doesn't check to see if an
>> >> intercept has been specified when the model has been specified using
>> >> model.matrix.
>> >
>> > Because lm() is documented not to check this. If you do not want to
>> > have an intercept in the model you have to specifically ask it for.
>> >
>> > Also, comparing the output of
>> > summary( lm(Gas ~ Insul/Temp - 1, data = whiteside) )
>> > and
>> > summary( lm(Gas ~ Insul/Temp, data = whiteside ) )
>> >
>> > you can see that lm() does not check whether there is an implicit
>> > intercept in the model. Compare the (Adjusted) R-squared values
>> > returned; one case is using the formula for models with no intercept
>> > the other one the formula for models with intercept. Similar story
>> > with the reported F-statistics.
>> >
>> > Cheers,
>> >
>> > Berwin
>> >
>> > =========================== Full address =============================
>> > Berwin A Turlach Tel.: +65 6515 4416 (secr)
>> > Dept of Statistics and Applied Probability +65 6515 6650 (self)
>> > Faculty of Science FAX : +65 6872 3919
>> > National University of Singapore
>> > 6 Science Drive 2, Blk S16, Level 7 e-mail: statba at nus.edu.sg
>> > Singapore 117546 http://www.stat.nus.edu.sg/~statba
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>>
> --
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> Dr. Gavin Simpson [t] +44 (0)20 7679 0522
> ECRC, UCL Geography, [f] +44 (0)20 7679 0565
> Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
> Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
> UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context: http://www.nabble.com/lm-model.matrix-confusion-%28--bug%29-tp14292188p14300313.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list