[R-pkg-devel] [External] Formula modeling

Ben Bolker bbo|ker @end|ng |rom gm@||@com
Fri Oct 8 00:56:23 CEST 2021


   There's a Formula package on CRAN 
<https://cran.r-project.org/web/packages/Formula/index.html> that's 
designed for this use case.

   lme4 and nlme don't use it, but implement their own formula 
manipulation machinery. (The cleanest version of this machinery is 
actually in glmmTMB at 
https://github.com/glmmTMB/glmmTMB/blob/master/glmmTMB/R/reformulas.R .)

   I would probably recommend Duncan's or Richard's approach, but if you 
want to keep your original syntax then the Formula package is probably 
the way to go.


On 10/7/21 5:58 PM, Duncan Murdoch wrote:
> I don't work with models like this, but I would find it more natural to 
> express the multiple formulas in a list:
> 
>    list(d ~ p + x + y, s ~ p + w + y, p ~ z + y)
> 
> I'd really have no idea how either of the proposals below should be parsed.
> 
> Of course, if people working with models like this are used to working 
> with notation like yours, that would be a strong argument to use your 
> notation.
> 
> Duncan Murdoch
> 
> On 07/10/2021 5:51 p.m., Richard M. Heiberger wrote:
>> I am responding to a subset of what you asked.  There are packages 
>> which use multiple formulas
>> in their argument sequence.
>>
>>
>> What you have as a single formula with | as a separator
>> q | p | subject | time | rho ~ p + x + y | p + w + y | z + y
>> I think would be better as a comma-separated list of formulas
>>
>> q , p , subject , time , rho ~ p + x + y , p + w + y , z + y
>>
>> because in R notation | is usually an operator, not a separator.
>>
>> lattice uses formulas and the | is used as a conditioning operator.
>>
>> nlme and lme4 can have multiple formulas in the same calling sequence.
>>
>> lme4 is newer.  from its ?lme4-package
>> ‘lme4’ covers approximately the same ground as the earlier ‘nlme’
>>       package.
>>
>> lme4 should probably be the modelyou are looking for for the package 
>> design.
>>
>>> On Oct 07, 2021, at 17:20, pikappa.devel using gmail.com wrote:
>>>
>>> Dear R-package-devel subscribers,
>>>
>>>
>>>
>>> My question concerns a package design issue relating to the usage of
>>> formulas.
>>>
>>>
>>>
>>> I am interested in describing via formulas systems of the form:
>>>
>>>
>>>
>>> d = p + x + y
>>>
>>> s = p + w + y
>>>
>>> p = z + y
>>>
>>> q = min(d,s).
>>>
>>>
>>>
>>> The context in which I am working is that of market models with, 
>>> primarily,
>>> panel data. In the above system, one may think of the first equation as
>>> demand, the second as supply, and the third as an equation 
>>> (co-)determining
>>> prices. The fourth equation is implicitly used by the estimation 
>>> method, and
>>> it does not need to be specified when programming the R formula. If 
>>> you need
>>> more information bout the system, you may check the package diseq.
>>> Currently, I am using constructors to build market model objects. In a
>>> constructor call, I pass [i] the right-hand sides of the first three
>>> equations as strings, [ii] an argument indicating whether the 
>>> equations of
>>> the system have correlated shocks, [iii] the identifiers of the used 
>>> dataset
>>> (one for the subjects of the panel and one for time), and [iv] the 
>>> quantity
>>> (q) and price (p) variables. These four arguments contain all the 
>>> necessary
>>> information for constructing a model.
>>>
>>>
>>>
>>> I would now like to re-implement model construction using formulas, 
>>> which
>>> would be a more regular practice for most R users. I am currently
>>> considering passing all the above information with a single formula 
>>> of the
>>> form:
>>>
>>>
>>>
>>> q | p | subject | time | rho ~ p + x + y | p + w + y | z + y
>>>
>>>
>>>
>>> where subject and time are the identifiers, and rho indicates whether
>>> correlated or independent shocks should be used.
>>>
>>>
>>>
>>> I am unaware of other packages that use formulas in this way (for 
>>> instance,
>>> passing the identifiers in the formula), and I wonder if this would go
>>> against any good practices. Would it be better to exclude some of the
>>> necessary elements for constructing the model? This might make the 
>>> resuting
>>> formulas more similar to those of models with multiple responses or 
>>> multiple
>>> parts. I am not sure, though, how one would use such model formulas 
>>> without
>>> all the relevant information. Is there any suggested design 
>>> alternative that
>>> I could check?
>>>
>>>
>>>
>>> I would appreciate any suggestions and discussion!
>>>
>>>
>>>
>>> Kind regards,
>>>
>>> Pantelis
>>>
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-package-devel using r-project.org mailing list
>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-package-devel&data=04%7C01%7Crmh%40temple.edu%7C21a51d63bc6242e5e24908d989d84fce%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637692385020500219%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=UKazmoIzXSn8DDQY3diUTPmVIg1cfTI3e1roXyo2DMQ%3D&reserved=0 
>>>
>>
>> ______________________________________________
>> R-package-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>
> 
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics



More information about the R-package-devel mailing list