[R] looking for formula parser that allows coefficients

Sat Aug 25 04:24:55 CEST 2018

The isChar function used in Parse is:

  isChar <- function(e, ch) identical(e, as.symbol(ch))
On Fri, Aug 24, 2018 at 10:06 PM Gabor Grothendieck
<ggrothendieck using gmail.com> wrote:
>
> Also here is a solution that uses formula processing rather than
> string processing.
> No packages are used.
>
> Parse <- function(e) {
>   if (length(e) == 1) {
>     if (is.numeric(e)) return(e)
>     else setNames(1, as.character(e))
>   } else {
>     if (isChar(e[[1]], "*")) {
>        x1 <- Recall(e[[2]])
>        x2 <- Recall(e[[3]])
>        setNames(unname(x1 * x2), paste0(names(x1), names(x2)))
>     } else if (isChar(e[[1]], "+")) c(Recall(e[[2]]), Recall(e[[3]]))
>     else if (isChar(e[[1]], "-")) {
>       if (length(e) == 2) -1 * Recall(e[[2]])
>       else c(Recall(e[[2]]), -Recall(e[[3]]))
>     } else if (isChar(e[[1]], ":")) setNames(1, paste(e[-1], collapse = ":"))
>   }
> }
>
> # test
> fo <- y ~ 2 - 1.1 * x1 + x3 - x1:x3 + 0.2 * x2:x2
> Parse(fo[[3]])
>
> giving:
>
>          x1    x3 x1:x3 x2:x2
>   2.0  -1.1   1.0  -1.0   0.2
> On Wed, Aug 22, 2018 at 11:50 AM Paul Johnson <pauljohn32 using gmail.com> wrote:
> >
> > Thanks as usual.  I owe you more KU decorations soon.
> > On Wed, Aug 22, 2018 at 2:34 AM Gabor Grothendieck
> > <ggrothendieck using gmail.com> wrote:
> > >
> > > Some string manipulation can convert the formula to a named vector such as
> > > the one shown at the end of your post.
> > >
> > > library(gsubfn)
> > >
> > > # input
> > > fo <- y ~ 2 - 1.1 * x1 + x3 - x1:x3 + 0.2 * x2:x2
> > >
> > > pat <- "([+-])? *(\\d\\S*)? *\\*? *([[:alpha:]]\\S*)?"
> > > ch <- format(fo[[3]])
> > > m <- matrix(strapplyc(ch, pat)[[1]], 3)
> > > m <- m[, colSums(m != "") > 0]
> > > m[2, m[2, ] == ""] <- 1
> > > m[3, m[3, ] == ""] <- "(Intercept)"
> > > co <- as.numeric(paste0(m[1, ], m[2, ]))
> > > v <- m[3, ]
> > > setNames(co, v)
> > > ## (Intercept)          x1          x3       x1:x3       x2:x2
> > > ##         2.0        -1.1         1.0        -1.0         0.2
> > > On Tue, Aug 21, 2018 at 6:46 PM Paul Johnson <pauljohn32 using gmail.com> wrote:
> > > >
> > > > Can you point me at any packages that allow users to write a
> > > > formula with coefficients?
> > > >
> > > > I want to write a data simulator that has a matrix X with lots
> > > > of columns, and then users can generate predictive models
> > > > by entering a formula that uses some of the variables, allowing
> > > > interactions, like
> > > >
> > > > y ~ 2 + 1.1 * x1 + 3 * x3 + 0.1 * x1:x3 + 0.2 * x2:x2
> > > >
> > > > Currently, in the rockchalk package, I have a function simulates
> > > > data (genCorrelatedData2), but my interface to enter the beta
> > > > coefficients is poor.  I assumed user would always enter 0's as
> > > > place holder for the unused coefficients, and the intercept is
> > > > always first. The unnamed vector is too confusing.  I have them specify:
> > > >
> > > > c(2, 1.1, 0, 3, 0, 0, 0.2, ...)
> > > >
> > > > I the documentation I say (ridiculously) it is easy to figure out from
> > > > the examples, but it really isnt.
> > > > It function prints out the equation it thinks you intended, thats
> > > > minimum protection against user error, but still not very good:
> > > >
> > > > dat <- genCorrelatedData2(N = 10, rho = 0.0,
> > > >           beta = c(1, 2, 1, 1, 0, 0.2, 0, 0, 0),
> > > >           means = c(0,0,0), sds = c(1,1,1), stde = 0)
> > > > [1] "The equation that was calculated was"
> > > > y = 1 + 2*x1 + 1*x2 + 1*x3
> > > >  + 0*x1*x1 + 0.2*x2*x1 + 0*x3*x1
> > > >  + 0*x1*x2 + 0*x2*x2 + 0*x3*x2
> > > >  + 0*x1*x3 + 0*x2*x3 + 0*x3*x3
> > > >  + N(0,0) random error
> > > >
> > > > But still, it is not very good.
> > > >
> > > > As I look at this now, I realize expect just the vech, not the whole vector
> > > > of all interaction terms, so it is even more difficult than I thought to get the
> > > > correct input.Hence, I'd like to let the user write a formula.
> > > >
> > > > The alternative for the user interface is to have named coefficients.
> > > > I can more or less easily allow a named vector for beta
> > > >
> > > > beta = c("(Intercept)" = 1, "x1" = 2, "x2" = 1, "x3" = 1, "x2:x1" = 0.1)
> > > >
> > > > I could build a formula from that.  That's not too bad. But I still think
> > > > it would be cool to allow formula input.
> > > >
> > > > Have you ever seen it done?
> > > > pj
> > > > --
> > > > Paul E. Johnson   http://pj.freefaculty.org
> > > > Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
> > > >
> > > > To write to me directly, please address me at pauljohn at ku.edu.
> > > >
> > > > ______________________________________________
> > > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> > >
> > > --
> > > Statistics & Software Consulting
> > > GKX Group, GKX Associates Inc.
> > > tel: 1-877-GKX-GROUP
> > > email: ggrothendieck at gmail.com
> >
> >
> >
> > --
> > Paul E. Johnson   http://pj.freefaculty.org
> > Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
> >
> > To write to me directly, please address me at pauljohn at ku.edu.
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com