[R] how to suppress the intercept in an lm()-like formula method?
Michael Friendly
friendly at yorku.ca
Tue Jan 29 16:21:58 CET 2013
On 1/29/2013 9:25 AM, Duncan Murdoch wrote:
> On 29/01/2013 9:14 AM, Michael Friendly wrote:
>> To partly answer my own question: It wasn't that hard to hack the
>> result of model.matrix() to remove the intercept,
>>
>> remove.intercept <- function(x) {
>> if (colnames(x)[1] == "(Intercept)") {
>> x <- x[,-1]
>> attr(x, "assign") <- attr(x, "assign")[-1]
>> }
>> x
>> }
>
> I think you need to do some of the low level calculations yourself,
> specifically the "terms" calculation.
>
> For example, this forces no intercept, regardless of what the user
> specifies. You might prefer just to change the default to no
> intercept and allow the user to use "+1" to add one, which looks
> harder...
>
> # .... set formula to the user's formula ...
> # Now modify it to suppress the intercept:
>
> class(formula) <- c("nointercept", class(formula))
> terms.nointercept <- function(x, ...) {
> result <- NextMethod(x, ...)
> attr(result, "intercept") <- 0
> result
> }
>
That's very clever. I think I can use this locally in my function.
However, it is still a mystery to me *where* in the process terms()
gets called. The main incantation for model formulae in lm() is below,
so who calls terms()?
cl <- match.call()
mf <- match.call(expand.dots = FALSE)
m <- match(c("formula", "data", "subset", "weights", "na.action"),
names(mf), 0L)
mf <- mf[c(1L, m)]
mf[[1L]] <- as.name("model.frame")
mf <- eval(mf, parent.frame())
mt <- attr(mf, "terms")
It is also a mystery to me where na.action takes its effect. Presumably
this is somewhere before lm.fit(x, y, ...)
gets called, but I'd like to take account of this in my cancor.default
method.
-Michael
> Now lm(formula) does a fit with no intercept.
>
> Duncan Murdoch
>
>>
>> However, the model frame and therefore the model terms stored in the
>> object are wrong, still including the intercept:
>>
>> Browse[1]> str(mt)
>> Classes 'terms', 'formula' length 3 cbind(SAT, PPVT, Raven) ~ n + s + ns
>> + na + ss
>> ..- attr(*, "variables")= language list(cbind(SAT, PPVT, Raven), n,
>> s, ns, na, ss)
>> ..- attr(*, "factors")= int [1:6, 1:5] 0 1 0 0 0 0 0 0 1 0 ...
>> .. ..- attr(*, "dimnames")=List of 2
>> .. .. ..$ : chr [1:6] "cbind(SAT, PPVT, Raven)" "n" "s" "ns" ...
>> .. .. ..$ : chr [1:5] "n" "s" "ns" "na" ...
>> ..- attr(*, "term.labels")= chr [1:5] "n" "s" "ns" "na" ...
>> ..- attr(*, "order")= int [1:5] 1 1 1 1 1
>> ..- attr(*, "intercept")= int 1
>> ..- attr(*, "response")= int 1
>> ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
>> ..- attr(*, "predvars")= language list(cbind(SAT, PPVT, Raven),
>> n, s,
>> ns, na, ss)
>> ..- attr(*, "dataClasses")= Named chr [1:6] "nmatrix.3" "numeric"
>> "numeric" "numeric" ...
>> .. ..- attr(*, "names")= chr [1:6] "cbind(SAT, PPVT, Raven)" "n" "s"
>> "ns" ...
>> Browse[1]>
>>
>>
>> On 1/29/2013 8:44 AM, Michael Friendly wrote:
>> > I'm trying to write a formula method for canonical correlation
>> analysis,
>> > that could be called similarly to lm() for
>> > a multivariate response:
>> >
>> > cancor(cbind(y1,y2,y3) ~ x1+x2+x3+x4, data=, ...)
>> > or perhaps more naturally,
>> > cancor(cbind(y1,y2,y3) ~ cbind(x1,x2,x3,x4), data=, ...)
>> >
>> > I've adapted the code from lm() to my case, but in this situation, it
>> > doesn't make sense to
>> > include an intercept, since X & Y are mean centered by default in the
>> > computation.
>> >
>> > In the code below, I can't see where the intercept gets included in
>> the
>> > model matrix and therefore
>> > how to suppress it. There is a test case at the end, showing that the
>> > method fails when called
>> > normally, but works if I explicitly use -1 in the formula. I could
>> hack
>> > the result of model.matrix(),
>> > but maybe there's an easier way?
>> >
>> > cancor <- function(x, ...) {
>> > UseMethod("cancor", x)
>> > }
>> >
>> > cancor.default <- candisc:::cancor
>> >
>> > # TODO: make cancisc::cancor() use x, y, not X, Y
>> > cancor.formula <- function(formula, data, subset, weights,
>> > na.action,
>> > method = "qr",
>> > model = TRUE,
>> > x = FALSE, y = FALSE, qr = TRUE,
>> > contrasts = NULL, ...) {
>> >
>> > cl <- match.call()
>> > mf <- match.call(expand.dots = FALSE)
>> > m <- match(c("formula", "data", "subset", "weights",
>> "na.action"),
>> > names(mf), 0L)
>> > mf <- mf[c(1L, m)]
>> >
>> > mf[[1L]] <- as.name("model.frame")
>> > mf <- eval(mf, parent.frame())
>> >
>> > mt <- attr(mf, "terms")
>> > y <- model.response(mf, "numeric")
>> > w <- as.vector(model.weights(mf))
>> > if (!is.null(w) && !is.numeric(w))
>> > stop("'weights' must be a numeric vector")
>> >
>> > x <- model.matrix(mt, mf, contrasts)
>> > # fixup to remove intercept???
>> > z <- if (is.null(w))
>> > cancor.default(x, y, ...)
>> > else stop("weights are not yet implemented") # lm.wfit(x, y,
>> w, ...)
>> >
>> > z$call <- cl
>> > z$terms <- mt
>> > z
>> > }
>> >
>> > TESTME <- FALSE
>> > if (TESTME) {
>> >
>> > # need to get latest version, 0.6-1 from R-Forge
>> > install.packages("candisc", repo="http://R-Forge.R-project.org")
>> > library(candisc)
>> >
>> > data(Rohwer)
>> >
>> > # this bombs: needs intercept removed
>> > cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss,
>> > data=Rohwer)
>> > ## Error in chol.default(Rxx) :
>> > ## the leading minor of order 1 is not positive definite
>> >
>> > #this works as is
>> > cc <- cancor.formula(cbind(SAT, PPVT, Raven) ~ -1 + n + s + ns + na +
>> > ss, data=Rohwer)
>> > cc
>> > ## Canonical correlation analysis of:
>> > ## 5 X variables: n, s, ns, na, ss
>> > ## with 3 Y variables: SAT, PPVT, Raven
>> > ##
>> > ## CanR CanRSQ Eigen percent cum
>> > ## 1 0.6703 0.44934 0.81599 77.30 77.30
>> > ## 2 0.3837 0.14719 0.17260 16.35 93.65
>> > ## 3 0.2506 0.06282 0.06704 6.35 100.00
>> > ##
>> > ## Test of H0: The canonical correlations in the
>> > ## current row and all that follow are zero
>> > ##
>> > ...
>> > }
>> >
>> >
>>
>>
>
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street Web: http://www.datavis.ca
Toronto, ONT M3J 1P3 CANADA
More information about the R-help
mailing list