[R-pkg-devel] [External] Formula modeling

pik@pp@@devei m@iii@g oii gm@ii@com pik@pp@@devei m@iii@g oii gm@ii@com
Fri Oct 8 20:47:16 CEST 2021


Hi,

The different environments can potentially be an issue in the future. I was not aware of the vector construction notation, and I think this is what I was mainly looking for. 

I could provide two initialization methods. One will use the ugly vector notation that one could use to bind the whole model with a particular environment. The second can be more user-friendly and use the comma-separated list of formulas. Essentially, the second will prepare the vector formula and call the first initialization method.

The (|) operator comment makes sense, and I would also want to avoid this to the extent that it is feasible.  So, I am currently thinking something along the line:

c(d, s, p | subject | time) ~ c(p + x + y, p + w + y, z + y)

This is very similar to how the function ?lme4::lmer uses the bar to separate expressions for design matrices from grouping factors. Actually, the subject and time variables are needed for subsetting prices for various operations required for the model matrix. 

Thanks for the suggestions; they are very helpful!

Best,
Pantelis

-----Original Message-----
From: Duncan Murdoch <murdoch.duncan using gmail.com> 
Sent: Friday, October 8, 2021 2:04 AM
To: Richard M. Heiberger <rmh using temple.edu>; pikappa.devel using gmail.com
Cc: r-package-devel using r-project.org
Subject: Re: [R-pkg-devel] [External] Formula modeling

On 07/10/2021 5:58 p.m., Duncan Murdoch wrote:
> I don't work with models like this, but I would find it more natural 
> to express the multiple formulas in a list:
> 
>     list(d ~ p + x + y, s ~ p + w + y, p ~ z + y)
> 
> I'd really have no idea how either of the proposals below should be parsed.

There's a disadvantage to this proposal.  I'd assume that "p" means the same in all 3 formulas, but with the notation I give, it could refer to
3 unrelated variables, because each of the formulas would have its own environment, and they could all be different.  I guess you could make it a requirement that they all use the same environment, but that's likely going to be confusing to users, who won't know what it means.

Another possibility that wouldn't have this problem (but in my opinion is kind of ugly) is to use R vector construction notation:

   c(d, s, p) ~ c(p + x + y, p + w + y, z + y)

Duncan Murdoch

> 
> Of course, if people working with models like this are used to working 
> with notation like yours, that would be a strong argument to use your 
> notation.
> 
> Duncan Murdoch
> 
> On 07/10/2021 5:51 p.m., Richard M. Heiberger wrote:
>> I am responding to a subset of what you asked.  There are packages 
>> which use multiple formulas in their argument sequence.
>>
>>
>> What you have as a single formula with | as a separator q | p | 
>> subject | time | rho ~ p + x + y | p + w + y | z + y I think would be 
>> better as a comma-separated list of formulas
>>
>> q , p , subject , time , rho ~ p + x + y , p + w + y , z + y
>>
>> because in R notation | is usually an operator, not a separator.
>>
>> lattice uses formulas and the | is used as a conditioning operator.
>>
>> nlme and lme4 can have multiple formulas in the same calling sequence.
>>
>> lme4 is newer.  from its ?lme4-package ‘lme4’ covers approximately 
>> the same ground as the earlier ‘nlme’
>>        package.
>>
>> lme4 should probably be the modelyou are looking for for the package design.
>>
>>> On Oct 07, 2021, at 17:20, pikappa.devel using gmail.com wrote:
>>>
>>> Dear R-package-devel subscribers,
>>>
>>>
>>>
>>> My question concerns a package design issue relating to the usage of 
>>> formulas.
>>>
>>>
>>>
>>> I am interested in describing via formulas systems of the form:
>>>
>>>
>>>
>>> d = p + x + y
>>>
>>> s = p + w + y
>>>
>>> p = z + y
>>>
>>> q = min(d,s).
>>>
>>>
>>>
>>> The context in which I am working is that of market models with, 
>>> primarily, panel data. In the above system, one may think of the 
>>> first equation as demand, the second as supply, and the third as an 
>>> equation (co-)determining prices. The fourth equation is implicitly 
>>> used by the estimation method, and it does not need to be specified 
>>> when programming the R formula. If you need more information bout the system, you may check the package diseq.
>>> Currently, I am using constructors to build market model objects. In 
>>> a constructor call, I pass [i] the right-hand sides of the first 
>>> three equations as strings, [ii] an argument indicating whether the 
>>> equations of the system have correlated shocks, [iii] the 
>>> identifiers of the used dataset (one for the subjects of the panel 
>>> and one for time), and [iv] the quantity
>>> (q) and price (p) variables. These four arguments contain all the 
>>> necessary information for constructing a model.
>>>
>>>
>>>
>>> I would now like to re-implement model construction using formulas, 
>>> which would be a more regular practice for most R users. I am 
>>> currently considering passing all the above information with a 
>>> single formula of the
>>> form:
>>>
>>>
>>>
>>> q | p | subject | time | rho ~ p + x + y | p + w + y | z + y
>>>
>>>
>>>
>>> where subject and time are the identifiers, and rho indicates 
>>> whether correlated or independent shocks should be used.
>>>
>>>
>>>
>>> I am unaware of other packages that use formulas in this way (for 
>>> instance, passing the identifiers in the formula), and I wonder if 
>>> this would go against any good practices. Would it be better to 
>>> exclude some of the necessary elements for constructing the model? 
>>> This might make the resuting formulas more similar to those of 
>>> models with multiple responses or multiple parts. I am not sure, 
>>> though, how one would use such model formulas without all the 
>>> relevant information. Is there any suggested design alternative that I could check?
>>>
>>>
>>>
>>> I would appreciate any suggestions and discussion!
>>>
>>>
>>>
>>> Kind regards,
>>>
>>> Pantelis
>>>
>>>
>>> 	[[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-package-devel using r-project.org mailing list
>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fst
>>> at.ethz.ch%2Fmailman%2Flistinfo%2Fr-package-devel&data=04%7C01%7
>>> Crmh%40temple.edu%7C21a51d63bc6242e5e24908d989d84fce%7C716e81efb5224
>>> 4738e3110bd02ccf6e5%7C0%7C0%7C637692385020500219%7CUnknown%7CTWFpbGZ
>>> sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
>>> %3D%7C3000&sdata=UKazmoIzXSn8DDQY3diUTPmVIg1cfTI3e1roXyo2DMQ%3D&
>>> amp;reserved=0
>>
>> ______________________________________________
>> R-package-devel using r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>
> 



More information about the R-package-devel mailing list