[Rd] model.matrix.default chokes on backquote (PR#7202)
Gabor Grothendieck
ggrothendieck at myway.com
Sat Aug 28 17:15:00 CEST 2004
>
> From: Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>
> "Gabor Grothendieck" <ggrothendieck at myway.com> writes:
>
> > > ggrothendieck at myway.com writes:
> > >
> > > > The following gives an error:
> > > >
> > > > > `a(b)` <- 1:4
> > > > > `c(d)` <- (1:4)^2
> > > > > lm(`a(b)` ~ `c(d)`)
> > > > Error in model.matrix.default(mt, mf, contrasts) :
> > > > model frame and formula mismatch in model.matrix()
> > > >
> > > > To fix it replace this line in model.matrix.default:
> > > >
> > > > reorder <- match(attr(t, "variables")[-1], names(data))
> > > >
> > > > with these two lines:
> > > >
> > > > strip.backquote <- function(x) gsub("^`(.*)`", "\\1", x)
> > > > reorder <- match(strip.backquote(attr(t, "variables"))[-1],
> > > > strip.backquote(names(data)))
> > >
> > > Hmm.. Yes, there's a bug (and it's likely not the only one we have
> > > relating to odd variable names in model formulas), but I suspect that
> > > the fix is wrong.
> > >
> > > The backquotes are not part of the variable names, but get added by
> > > deparsing -- sometimes! Other times they do not: Try for instance
> > > as.character(quote(`a(b)`)). (Which is as it should be. Other pieces
> > > of logic relating to nonsyntactical names represent some rather
> > > awkward compromises.)
> > >
> > > When backquotes have found their way into names(data) or the
> > > "variables" attribute, I would rather suspect that they were created
> > > by the wrong tool and fix that, not cure the symptom by stripping them
> > > off at a later stage.
> >
> > In model.frame.default there is a line:
> >
> > varnames <- as.character(vars[-1])
> >
> > that turns part of a call object, vars, into a character string.
> > We could change that to:
> >
> > varnames <- strip.backquote(as.character(as.list(vars[-1])))
> >
> > or perhaps as.character should not return the backquotes in the
> > first place in which case the fix would be to fix as.character.
>
> Or not use it in this way. I forget what the reasoning was behind the
> current behaviour of as.character, but the point is that
>
> > as.character(attr(terms(`a(b)`~`c(d)`),"variables"))
> [1] "list" "`a(b)`" "`c(d)`"
>
> whereas for instance
>
> > sapply(attr(terms(`a(b)`~`c(d)`),"variables")[-1],as.character)
> [1] "a(b)" "c(d)"
1. That is quite subtle but a fix based on that would appear to
solve it.
2. Your example and possibly some verbiage should be added to
?as.character .
3. In looking for the offending spot, I seem to remember (though I did
not keep track of it) that one or more of lm, model.frame.default,
terms.formula, etc. had additional applications of as.character
directly to a list as in your first example and these should
probably be changed to correspond to your second example, as
well, where as.character is applied to the elements of the
list rather than the lsit itself.
More information about the R-devel
mailing list