[Rd] Issues with drop.terms

Ben Bolker bbo|ker @end|ng |rom gm@||@com
Tue Aug 24 00:03:45 CEST 2021

   Small follow-up: (1) in order for lm() to actually work you need 
keep.response=TRUE in the drop.terms() call (I realize that this is 
*not* the problem in your example)

test4 <- terms(mpg ~ hp + I(cyl==4) + disp + wt )
check4 <- drop.terms(test4, 3, keep.response = TRUE)
lm( check4, data=mtcars)

(2) I'm ambivalent about your "We can argue that the user should have 
used I(cyl==4), but very many won't." argument. This is the ever-present 
"document precisely and require users to know and follow the 
documentation" vs. "try to protect users from themselves" debate - 
taking either side to an extreme is (IMO) unproductive. I don't know how 
hard it would be to make drop.terms() **not** drop parentheses, but it 
seems like it may be very hard/low-level. My vote would be to see if 
there is a reasonably robust way to detect these constructions and 
**warn** about them.

   I have probably asked about this before, but if anyone knows of 
useful materials that go into more details about the definitions and 
implementation of model matrix/terms/etc. machinery, *beyond* the 
appropriate chapter of "Statistical Models in S" (Becker/Chambers white 
book), *or* the source code itself, I would love some pointers ...

  Ben Bolker

On 8/23/21 10:36 AM, Therneau, Terry M., Ph.D. via R-devel wrote:
> This is a follow-up to my earlier note on [.terms.   Based on a couple days' work getting
> the survival package to work around  issues, this will hopefully be more concise and
> better expressed than the prior note.
> 1.
> test1 <- terms( y ~ x1:x2 + x3)
> check <- drop.terms(termobj =test1, dropx = 1)
> formula(check)
> ## ~x1:x2
> The documentation for the dropx argument is "vector of positions of variables to drop from
> the right hand side of the model", but it is not clear what "positions" is.   I originally
> assumed "the order in the formula as typed", but was wrong.   I suggest adding a line
> "Position refers to the order of terms in the term.labels attribute of the terms object,
> which is also the order they will appear in a coefficient vector (not counting the
> intercept).
> 2.
> library(splines)
> test2 <- terms(model.frame(mpg ~  offset(cyl) + ns(hp, df=3) + disp + wt, data=mtcars))
> check2 <- drop.terms(test2,  dropx = 2)
> formula(check2)
> ## ~ns(hp, df=3) + wt
> One side effect of how drop.terms is implemented, and one that I suspect was not intended,
> is that offsets are completly ignored.    The above drops both the offset and the disp
> term from the formula   The dataClasses and predvars attributes of the result are also
> incorrect: they have lost the ns() term rather than the disp term;
> the results of predict will be incorrect.
> attr(check2, "predvars")
> ##    list(offset(cyl), disp, wt)
> Question: should the function be updated to not drop offsets? If not a line needs to be
> added to the help file.   The handling of predvars needs to be fixed regardless.
> 3.
> test3 <- terms(mpg ~ hp + (cyl==4) + disp + wt )
> check3 <- drop.terms(test3, 3)
> formula(check3)
> lm( check3, data=mtcars)   # fails
> The drop.terms action has lost the () around the logical expression, which leads to an
> invalid formula.  We can argue that the user should have used I(cyc==4), but very many won't.
> 4. As a footnote, more confusion (for me) is generated by the fact that the "specials"
> attribute of a formula does not use the numbering discussed in 1 above.   I had solved
> this issue long ago in the untangle.specials function; long enough ago that I forgot I had
> solved it, and just wasted a day rediscovering that fact.
> ---
> I can create a patch for 1 and 2 (once we answer my question), but a fix for 3 is not
> clear to me.  It currently leads to failure in a coxph call that includes a strata so I am
> directly interested in a solution; e.g.,  coxph(Surv(time, status) ~ age + (ph.ecog==2) +
> strata(inst), data=lung)
> Terry T

Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics

More information about the R-devel mailing list