[Rd] specials and ::

Ben Bolker bbo|ker @end|ng |rom gm@||@com
Tue Aug 27 16:04:14 CEST 2024


   I don't see a big downside, but I will say that there's always a bit 
of a tradeoff between "train the users to do it right" (by writing clear 
documentation and informative error messages) and "make things easy for 
the user" (by making the code more complicated to handle things for them 
automatically).

    For example, part of me wishes that (1) there were only one way to 
provide a response variable for a binomial variable with N>1 (preferably 
by specifying proportions and a weights argument) and (2) grouping 
variables in lme4/nlme/et al always had to be specified as factors 
(rather than automatically being coerced to factors). Making those 
decisions would avoid so much code complexity ... (and eliminate one 
class of errors, i.e. people including a continuous covariate as a 
random-effect grouping variable because they think of 'random effect' 
and 'nuisance variable' as synonyms ...)

   But taking the "train the users to do it right" path does also 
involve more discussion with users ("if your software knows what I 
should be doing why can't it just do it for me?")

   cheers
    Ben Bolker

On 2024-08-27 9:43 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:
> You are right of course, Peter, but I can see where some will get confused.   In a formula
> some symbols and functions are special operators, and others are simple functions.   That
> is the reason one needs I(events/time) to put a rate in as a variable.    Someone who
> types 'offset' at the command line will see that there actually IS a function behind the
> scenes.
> 
> Does anyone see a downside to Bill Dunlap's suggestion where the first step of my formula
> processing would be to "clean off" any survival:: modifiers?    That is, something that
> will break? After all, the code already has a lot of  "if (....) "  lines for other common
> user errors.   I could view it as just saving me the time to deal with the 'we found an
> error' emails.   I would output the corrected version as the "call" component.
> 
> Terry
> 
> On 8/27/24 03:38, peter dalgaard wrote:
>> In my view, that's just plain wrong, because strata() is not a function but a special operator in a model formula. Wouldn't it also blow up on stats::offset()?
>>
>> Oh, yes it would:
>>
>>> lm(y~x+offset(z))
>> Call:
>> lm(formula = y ~ x + offset(z))
>>
>> Coefficients:
>> (Intercept)            x
>>        0.7350       0.0719
>>
>>> lm(y~x+stats::offset(z))
>> Call:
>> lm(formula = y ~ x + stats::offset(z))
>>
>> Coefficients:
>>        (Intercept)                 x  stats::offset(z)
>>             0.6457            0.1078            0.8521
>>
>>
>> Or, to be facetious:
>>
>>> lm(y~base::"+"(x,z))
>> Call:
>> lm(formula = y ~ base::"+"(x, z))
>>
>> Coefficients:
>>       (Intercept)  base::"+"(x, z)
>>            0.4516           0.4383
>>
>>
>>
>> -pd
>>
>>> On 26 Aug 2024, at 16:42 , Therneau, Terry M., Ph.D. via R-devel<r-devel using r-project.org>  wrote:
>>>
>>> The survival package makes significant use of the "specials" argument of terms(), before
>>> calling model.frame; it is part of nearly every modeling function. The reason is that
>>> strata argments simply have to be handled differently than other things on the right hand
>>> side. Likewise for tt() and cluster(), though those are much less frequent.
>>>
>>> I now get "bug reports" from the growing segment that believes one should put
>>> packagename:: in front of every single instance.   For instance
>>>         fit <- survival::survdiff( survival::Surv(time, status) ~ ph.karno +
>>> survival::strata(inst),  data= survival::lung)
>>>
>>> This fails to give the correct answer because it fools terms(formula, specials=
>>> "strata").    I've stood firm in my response of "that's your bug, not mine", but I begin
>>> to believe I am swimming uphill.   One person responded that it was company policy to
>>> qualify everything.
>>>
>>> I don't see an easy way to fix survival, and even if I did it would be a tremendous amout
>>> of work.   What are other's thoughts?
>>>
>>> Terry
>>>
>>>
>>>
>>> -- 
>>>
>>> Terry M Therneau, PhD
>>> Department of Quantitative Health Sciences
>>> Mayo Clinic
>>> therneau using mayo.edu
>>>
>>> "TERR-ree THUR-noh"
>>>
>>> 	[[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel using r-project.org  mailing list
>>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=05%7C02%7Ctherneau%40mayo.edu%7C7659a5f0f0d34746966a08dcc6739fed%7Ca25fff9c3f634fb29a8ad9bdd0321f9a%7C0%7C0%7C638603447151664511%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=UAkeksswfFdLwOdzQIOXUPC2Ey255oW%2FX41kptNZNcU%3D&reserved=0
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
 > E-mail is sent at my convenience; I don't expect replies outside of 
working hours.



More information about the R-devel mailing list