[Rd] Issues with drop.terms
Therneau, Terry M., Ph.D.
therne@u @end|ng |rom m@yo@edu
Mon Aug 23 16:36:29 CEST 2021
This is a follow-up to my earlier note on [.terms. Based on a couple days' work getting
the survival package to work around issues, this will hopefully be more concise and
better expressed than the prior note.
1.
test1 <- terms( y ~ x1:x2 + x3)
check <- drop.terms(termobj =test1, dropx = 1)
formula(check)
## ~x1:x2
The documentation for the dropx argument is "vector of positions of variables to drop from
the right hand side of the model", but it is not clear what "positions" is. I originally
assumed "the order in the formula as typed", but was wrong. I suggest adding a line
"Position refers to the order of terms in the term.labels attribute of the terms object,
which is also the order they will appear in a coefficient vector (not counting the
intercept).
2.
library(splines)
test2 <- terms(model.frame(mpg ~ offset(cyl) + ns(hp, df=3) + disp + wt, data=mtcars))
check2 <- drop.terms(test2, dropx = 2)
formula(check2)
## ~ns(hp, df=3) + wt
One side effect of how drop.terms is implemented, and one that I suspect was not intended,
is that offsets are completly ignored. The above drops both the offset and the disp
term from the formula The dataClasses and predvars attributes of the result are also
incorrect: they have lost the ns() term rather than the disp term;
the results of predict will be incorrect.
attr(check2, "predvars")
## list(offset(cyl), disp, wt)
Question: should the function be updated to not drop offsets? If not a line needs to be
added to the help file. The handling of predvars needs to be fixed regardless.
3.
test3 <- terms(mpg ~ hp + (cyl==4) + disp + wt )
check3 <- drop.terms(test3, 3)
formula(check3)
lm( check3, data=mtcars) # fails
The drop.terms action has lost the () around the logical expression, which leads to an
invalid formula. We can argue that the user should have used I(cyc==4), but very many won't.
4. As a footnote, more confusion (for me) is generated by the fact that the "specials"
attribute of a formula does not use the numbering discussed in 1 above. I had solved
this issue long ago in the untangle.specials function; long enough ago that I forgot I had
solved it, and just wasted a day rediscovering that fact.
---
I can create a patch for 1 and 2 (once we answer my question), but a fix for 3 is not
clear to me. It currently leads to failure in a coxph call that includes a strata so I am
directly interested in a solution; e.g., coxph(Surv(time, status) ~ age + (ph.ecog==2) +
strata(inst), data=lung)
Terry T
--
Terry M Therneau, PhD
Department of Quantitative Health Sciences
Mayo Clinic
therneau using mayo.edu
"TERR-ree THUR-noh"
[[alternative HTML version deleted]]
More information about the R-devel
mailing list