[Rd] named arguments in formula and terms
Achim Zeileis
Achim.Zeileis at R-project.org
Fri Mar 10 15:02:38 CET 2017
Hi, we came across the following unexpected (for us) behavior in
terms.formula: When determining whether a term is duplicated, only the
order of the arguments in function calls seems to be checked but not their
names. Thus the terms f(x, a = z) and f(x, b = z) are deemed to be
duplicated and one of the terms is thus dropped.
R> attr(terms(y ~ f(x, a = z) + f(x, b = z)), "term.labels")
[1] "f(x, a = z)"
However, changing the arguments or the order of arguments keeps both
terms:
R> attr(terms(y ~ f(x, a = z) + f(x, b = zz)), "term.labels")
[1] "f(x, a = z)" "f(x, b = zz)"
R> attr(terms(y ~ f(x, a = z) + f(b = z, x)), "term.labels")
[1] "f(x, a = z)" "f(b = z, x)"
Is this intended behavior or needed for certain terms?
We came across this problem when setting up certain smooth regressors with
different kinds of patterns. As a trivial simplified example we can
generate the same kind of problem with rep(). Consider the two dummy
variables rep(x = 0:1, each = 4) and rep(x = 0:1, times = 4). With the
response y = 1:8 I get:
R> lm((1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4))
Call:
lm(formula = (1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4))
Coefficients:
(Intercept) rep(x = 0:1, each = 4)
2.5 4.0
So while the model is identified because the two regressors are not the
same, terms.fomula does not recognize this and drops the second regressor.
What I would have wanted can be obtained by switching the arguments:
R> lm((1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times = 4))
Call:
lm(formula = (1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times = 4))
Coefficients:
(Intercept) rep(each = 4, x = 0:1) rep(x = 0:1, times = 4)
2 4 1
Of course, here I could avoid the problem by setting up proper factors
etc. But to me this looks a potential bug in terms.formula...
Thanks in advance for any insights,
Z
More information about the R-devel
mailing list