[R] looking for formula parser that allows coefficients

Gabor Grothendieck ggrothend|eck @end|ng |rom gm@||@com
Wed Aug 22 09:33:53 CEST 2018


Some string manipulation can convert the formula to a named vector such as
the one shown at the end of your post.

library(gsubfn)

# input
fo <- y ~ 2 - 1.1 * x1 + x3 - x1:x3 + 0.2 * x2:x2

pat <- "([+-])? *(\\d\\S*)? *\\*? *([[:alpha:]]\\S*)?"
ch <- format(fo[[3]])
m <- matrix(strapplyc(ch, pat)[[1]], 3)
m <- m[, colSums(m != "") > 0]
m[2, m[2, ] == ""] <- 1
m[3, m[3, ] == ""] <- "(Intercept)"
co <- as.numeric(paste0(m[1, ], m[2, ]))
v <- m[3, ]
setNames(co, v)
## (Intercept)          x1          x3       x1:x3       x2:x2
##         2.0        -1.1         1.0        -1.0         0.2
On Tue, Aug 21, 2018 at 6:46 PM Paul Johnson <pauljohn32 using gmail.com> wrote:
>
> Can you point me at any packages that allow users to write a
> formula with coefficients?
>
> I want to write a data simulator that has a matrix X with lots
> of columns, and then users can generate predictive models
> by entering a formula that uses some of the variables, allowing
> interactions, like
>
> y ~ 2 + 1.1 * x1 + 3 * x3 + 0.1 * x1:x3 + 0.2 * x2:x2
>
> Currently, in the rockchalk package, I have a function simulates
> data (genCorrelatedData2), but my interface to enter the beta
> coefficients is poor.  I assumed user would always enter 0's as
> place holder for the unused coefficients, and the intercept is
> always first. The unnamed vector is too confusing.  I have them specify:
>
> c(2, 1.1, 0, 3, 0, 0, 0.2, ...)
>
> I the documentation I say (ridiculously) it is easy to figure out from
> the examples, but it really isnt.
> It function prints out the equation it thinks you intended, thats
> minimum protection against user error, but still not very good:
>
> dat <- genCorrelatedData2(N = 10, rho = 0.0,
>           beta = c(1, 2, 1, 1, 0, 0.2, 0, 0, 0),
>           means = c(0,0,0), sds = c(1,1,1), stde = 0)
> [1] "The equation that was calculated was"
> y = 1 + 2*x1 + 1*x2 + 1*x3
>  + 0*x1*x1 + 0.2*x2*x1 + 0*x3*x1
>  + 0*x1*x2 + 0*x2*x2 + 0*x3*x2
>  + 0*x1*x3 + 0*x2*x3 + 0*x3*x3
>  + N(0,0) random error
>
> But still, it is not very good.
>
> As I look at this now, I realize expect just the vech, not the whole vector
> of all interaction terms, so it is even more difficult than I thought to get the
> correct input.Hence, I'd like to let the user write a formula.
>
> The alternative for the user interface is to have named coefficients.
> I can more or less easily allow a named vector for beta
>
> beta = c("(Intercept)" = 1, "x1" = 2, "x2" = 1, "x3" = 1, "x2:x1" = 0.1)
>
> I could build a formula from that.  That's not too bad. But I still think
> it would be cool to allow formula input.
>
> Have you ever seen it done?
> pj
> --
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
>
> To write to me directly, please address me at pauljohn at ku.edu.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com




More information about the R-help mailing list