[R] formula wrangling

Koenker, Roger W rkoenker @end|ng |rom ||||no|@@edu
Mon Sep 21 10:52:48 CEST 2020

I need some help with a formula processing problem that arose from a seemingly innocuous  request
that I add a “subset” argument to the additive modeling function “rqss” in my quantreg package.

I’ve tried to boil the relevant code down to something simpler as illustrated below.  The formulae in
question involve terms called “qss” that construct sparse matrix objects, but I’ve replaced all that with
a much simpler BoxCox construction that I hope illustrates the basic difficulty.  What is supposed to happen
is that xss objects are evaluated and cbind’d to the design matrix, subject to the same subset restriction
as the rest of the model frame.  However, this doesn’t happen, instead the xss vectors are evaluated
on the full sample and the cbind operation generates a warning which probably should be an error.
I’ve inserted a browser() to make it easy to verify that the length of xss[[[1]] doesn’t match dim(X).

Any suggestions would be most welcome, including other simplifications of the code.  Note that
the function untangle.specials() is adapted, or perhaps I should say adopted form the survival 
package so you would need the quantreg package to run the attached code.


fit <- function(formula, subset, data, ...){
    call <- match.call()
    m <- match.call(expand.dots = FALSE)
    tmp <- c("", "formula", "subset", "data")
    m <- m[match(tmp, names(m), nomatch = 0)]
    m[[1]] <- as.name("model.frame")
    Terms <- if(missing(data)) terms(formula,special = "qss")
	    else terms(formula, special = "qss", data = data)
    qssterms <- attr(Terms, "specials")$qss
    if (length(qssterms)) {
        tmpc <- untangle.specials(Terms, "qss")
        dropx <- tmpc$terms
        if (length(dropx)) 
            Terms <- Terms[-dropx]
        attr(Terms, "specials") <- tmpc$vars
	fnames <- function(x) {
            fy <- all.names(x[[2]])
            if (fy[1] == "cbind") 
                fy <- fy[-1]
        fqssnames <- unlist(lapply(parse(text = tmpc$vars), fnames))
        qssnames <- unlist(lapply(parse(text = tmpc$vars), function(x) deparse(x[[2]])))
    if (exists("fqssnames")) {
        ffqss <- paste(fqssnames, collapse = "+")
        ff <- as.formula(paste(deparse(formula), "+", ffqss))
    m$formula <- Terms
    m <- eval(m, parent.frame())
    Y <- model.extract(m, "response")
    X <- model.matrix(Terms, m)
    ef <- environment(formula)
    qss <- function(x, lambda) (x^lambda - 1)/lambda
    if (length(qssterms) > 0) {
        xss <- lapply(tmpc$vars, function(u) eval(parse(text = u), m, enclos = ef))
	for(i in 1:length(xss)){
	    X <- cbind(X, xss[[i]]) # Here is the problem
    z <- lm.fit(X,Y) # The dreaded least squares fit
# Test case
n <- 200
x <- sort(rchisq(n,4))
z <- rnorm(n)
s <- sample(1:n, n/2)
y <- log(x) + rnorm(n)/5
D = data.frame(y = y, x = x, z = z, s = (1:n) %in% s)
plot(x, y)
lam = 0.2
#f0 <- fit(y ~ qss(x,lambda = lam) + z, subset = s)
f1 <- fit(y ~ qss(x, lambda = lam) + z, subset = s, data = D)

More information about the R-help mailing list