[R-pkg-devel] [External] Formula modeling

Fri Oct 8 02:03:44 CEST 2021

On 07/10/2021 5:58 p.m., Duncan Murdoch wrote:
> I don't work with models like this, but I would find it more natural to
> express the multiple formulas in a list:
> 
>     list(d ~ p + x + y, s ~ p + w + y, p ~ z + y)
> 
> I'd really have no idea how either of the proposals below should be parsed.

There's a disadvantage to this proposal.  I'd assume that "p" means the 
same in all 3 formulas, but with the notation I give, it could refer to 
3 unrelated variables, because each of the formulas would have its own 
environment, and they could all be different.  I guess you could make it 
a requirement that they all use the same environment, but that's likely 
going to be confusing to users, who won't know what it means.

Another possibility that wouldn't have this problem (but in my opinion 
is kind of ugly) is to use R vector construction notation:

   c(d, s, p) ~ c(p + x + y, p + w + y, z + y)

Duncan Murdoch

> 
> Of course, if people working with models like this are used to working
> with notation like yours, that would be a strong argument to use your
> notation.
> 
> Duncan Murdoch
> 
> On 07/10/2021 5:51 p.m., Richard M. Heiberger wrote:
>> I am responding to a subset of what you asked.  There are packages which use multiple formulas
>> in their argument sequence.
>>
>>
>> What you have as a single formula with | as a separator
>> q | p | subject | time | rho ~ p + x + y | p + w + y | z + y
>> I think would be better as a comma-separated list of formulas
>>
>> q , p , subject , time , rho ~ p + x + y , p + w + y , z + y
>>
>> because in R notation | is usually an operator, not a separator.
>>
>> lattice uses formulas and the | is used as a conditioning operator.
>>
>> nlme and lme4 can have multiple formulas in the same calling sequence.
>>
>> lme4 is newer.  from its ?lme4-package
>> ‘lme4’ covers approximately the same ground as the earlier ‘nlme’
>>        package.
>>
>> lme4 should probably be the modelyou are looking for for the package design.
>>
>>> On Oct 07, 2021, at 17:20, pikappa.devel using gmail.com wrote:
>>>
>>> Dear R-package-devel subscribers,
>>>
>>>
>>>
>>> My question concerns a package design issue relating to the usage of
>>> formulas.
>>>
>>>
>>>
>>> I am interested in describing via formulas systems of the form:
>>>
>>>
>>>
>>> d = p + x + y
>>>
>>> s = p + w + y
>>>
>>> p = z + y
>>>
>>> q = min(d,s).
>>>
>>>
>>>
>>> The context in which I am working is that of market models with, primarily,
>>> panel data. In the above system, one may think of the first equation as
>>> demand, the second as supply, and the third as an equation (co-)determining
>>> prices. The fourth equation is implicitly used by the estimation method, and
>>> it does not need to be specified when programming the R formula. If you need
>>> more information bout the system, you may check the package diseq.
>>> Currently, I am using constructors to build market model objects. In a
>>> constructor call, I pass [i] the right-hand sides of the first three
>>> equations as strings, [ii] an argument indicating whether the equations of
>>> the system have correlated shocks, [iii] the identifiers of the used dataset
>>> (one for the subjects of the panel and one for time), and [iv] the quantity
>>> (q) and price (p) variables. These four arguments contain all the necessary
>>> information for constructing a model.
>>>
>>>
>>>
>>> I would now like to re-implement model construction using formulas, which
>>> would be a more regular practice for most R users. I am currently
>>> considering passing all the above information with a single formula of the
>>> form:
>>>
>>>
>>>
>>> q | p | subject | time | rho ~ p + x + y | p + w + y | z + y
>>>
>>>
>>>
>>> where subject and time are the identifiers, and rho indicates whether
>>> correlated or independent shocks should be used.
>>>
>>>
>>>
>>> I am unaware of other packages that use formulas in this way (for instance,
>>> passing the identifiers in the formula), and I wonder if this would go
>>> against any good practices. Would it be better to exclude some of the
>>> necessary elements for constructing the model? This might make the resuting
>>> formulas more similar to those of models with multiple responses or multiple
>>> parts. I am not sure, though, how one would use such model formulas without
>>> all the relevant information. Is there any suggested design alternative that
>>> I could check?
>>>
>>>
>>>
>>> I would appreciate any suggestions and discussion!
>>>
>>>
>>>
>>> Kind regards,
>>>
>>> Pantelis
>>>
>>>
>>> 	[[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-package-devel using r-project.org mailing list
>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-package-devel&data=04%7C01%7Crmh%40temple.edu%7C21a51d63bc6242e5e24908d989d84fce%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637692385020500219%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=UKazmoIzXSn8DDQY3diUTPmVIg1cfTI3e1roXyo2DMQ%3D&reserved=0
>>
>> ______________________________________________
>> R-package-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>
>