[RsR] Several bugs for th lmRob function

Tue Dec 8 13:56:58 CET 2009

Dear all,
Today, I send two mails related to robust ANOVA in R, but since the 
scope and target are different, it is better to separate the subjects. 
In the next mail, there is a praise to improve the accessibility of 
robust methods for ANOVA. As far as I can understand, roust ANOVA cannot 
be run with lmrob from the robustbase library, see the other mail. In 
this mail, I list several bugs in lmRob from the robust library.

    * There is a bug in the function "lmRob": when formulas with
      interaction between factors are given, only the main effects are
      considered as factors, but the interaction is considered as a
      continuous variable. This is problematic, since the initial
      estimator is very likely give an error because of the discrete
      type of the interaction. Example:

     > summary(lmRob (formula = Resp~Origine*Sexe, data = POVm3))

    Error in psi.weight(wi[wcnd], ipsi, xk) :
      NA/NaN/Inf in foreign function call (arg 2)
    In addition: Warning message:
    In lmRob.fit.compute(x2, y, x1 = x1, x1.idx = x1.idx, nrep = nrep,  :
      Initial scale less than tl =  1e-06 .

    Attached is a fix to the function lmRob, see the two lines with "###
    added". I'm not a guru programmer so you might have to check the
    code. With this corrected function, the above call works.

    * It is not a bug, but there are more initial algorithms than
      initial.alg in lmRob.control let the user choose. More
      importantly, depending on the type of covariates, the user's
      choice can be silently overridden (in the function
      lmRob.fit.compute). I do not criticize this, but I suggest that
      (a) a message is displayed to inform the user of this modification
      and (b) that the initial algorithm that is finally used be written
      the lmRob object. For the moment, the user has no way to know
      which initial algorithm is used.

    * There is a bug in the function "anova.lmRob": in the presence of
      missing values in the covariates, the models that are compared do
      not contain the same number of observations, which drives the
      inference completely wrong. Again, I am not a specialist, but I
      think it comes from the line

                   curobj <- update(object, curfrm)   
    which uses the original data frame instead of using only the
    observations with no missing in the FULL model (which are given in
    the lm object obj$model). When smaller models are compared,
    typically some observations that were not used in the full model are
    used. I'm not sure how to fix this bug.

    * Maybe related to this missing value problem, I have found an F
      value equal to -133, although the corresponding t-value in summary
      was close to zero, but not suspect.

     > anova(lmRob(ttf~income*sexe,data=cig,na.action=na.omit))

    Terms added sequentially (first to last)

                Chisq Df  RobustF     Pr(F)   
    (Intercept)        1                      
    income             1   69.576 < 2.2e-16 ***
    sexe               1   14.989  7.97e-05 ***
    income:sexe        1 -133.293         1   
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    Warning messages:
    1: In chi.weight(res/Scale, ipsi, yc) - chi.weight(Res/Scale, ipsi,  :
      longer object length is not a multiple of shorter object length
    2: In chi.weight(res/Scale, ipsi, yc) - chi.weight(Res/Scale, ipsi,  :
      longer object length is not a multiple of shorter object length

    * Some prominent members of the S/R community will not consider this
      as a bug, but in a companion mail, I give arguments to include
      so-called "Type III sums of squares" or effect tested marginally.
      In the context of unbalanced ANOVA, there are other prominent
      members of the statistics community that give extremely convincing
      arguments, see other mail. I know it can be done by hand, but for
      an average user, having it as an optional argument to anova.lmRob
      would be an important argument to use R and robust ANOVA.

 Cheers,
Olivier

-- 
!!! New e-mail, please update your address book !!!
Olivier.Renaud using unige.ch               http://www.unige.ch/fapse/mad/
Methodology & Data Analysis - Psychology Dept - University of Geneva
UniMail, Office 4164  -  40, Bd du Pont d'Arve   -  CH-1211 Geneva 4

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/r-sig-robust/attachments/20091208/044d577f/attachment.html>

-------------- next part --------------
 lmRob = 
function (formula, data, weights, subset, na.action, model = TRUE, 
    x = FALSE, y = FALSE, contrasts = NULL, nrep = NULL, control = lmRob.control(...), 
    genetic.control = NULL, ...) 
{
    the.call <- match.call()
    m <- match.call(expand = FALSE)
    m$model <- m$x <- m$y <- m$contrasts <- m$nrep <- NULL
    m$control <- m$genetic.control <- m$... <- NULL
    m[[1]] <- as.name("model.frame")
    m <- eval(m, sys.parent())
    Terms <- attr(m, "terms")
    weights <- model.extract(m, weights)
    Y <- model.extract(m, "response")
    X <- model.matrix(Terms, m, contrasts)
    if (nrow(X) <= ncol(X)) 
        stop("Robust method is inappropriate: not enough observations.")
    asgn <- attr(X, "assign")
    if (!is.list(asgn)) 
        asgn <- splus.assign(asgn, attr(Terms, "term.labels"))
    factor.names <- names(m)[sapply(m, is.factor)]  ### Forgets that interaction can also be factors !
    at = attr(Terms,"factors")   ### added
    if (length(factor.names)>0)
      factor.names <- names(apply(at[ setdiff(dimnames(at)[[1]], factor.names),, drop=FALSE ], 2, sum)==0) ### added
    x1.idx <- unname(unlist(asgn[factor.names]))
    X1 <- X[, x1.idx, drop = FALSE]
    X2 <- X[, setdiff(1:(dim(X)[2]), x1.idx), drop = FALSE]
    p2 <- ncol(X2)
    cnam2 <- dimnames(X2)[[2]]
    if (dim(X1)[2] > 0 && p2 > 1 && cnam2[1] == "(Intercept)") {
        X1 <- cbind(X2[, 1], X1)
        dimnames(X1)[[2]][1] <- "(Intercept)"
        X2 <- X2[, -1, drop = FALSE]
        x1.idx <- c(1L, x1.idx)
    }
    else if (p2 == 1 && cnam2 == "(Intercept)") {
        X1 <- X
        x1.idx <- 1:ncol(X)
        X2 <- NULL
    }
    if (length(dim(X1)) && dim(X1)[2] == 0) {
        X1 <- NULL
        x1.idx <- NULL
    }
    if (length(dim(X2)) && dim(X2)[2] == 0) 
        X2 <- NULL

    print(x1.idx)
    print(X2)

    fit <- if (length(weights)) 
        lmRob.wfit(X, Y, weights, x1 = X1, x2 = X2, x1.idx = x1.idx, 
            nrep = nrep, robust.control = control, genetic.control = genetic.control)
    else lmRob.fit(X, Y, x1 = X1, x2 = X2, x1.idx = x1.idx, nrep = nrep, 
        robust.control = control, genetic.control = genetic.control)
    if (is.null(fit)) 
        return(NULL)
    fit$terms <- Terms
    fit$call <- the.call
    x.names <- dimnames(X)[[2]]
    pasgn <- asgn
    qrx <- qr(X)
    Rk <- qrx[["rank"]]
    piv <- qrx[["pivot"]][1:Rk]
    newpos <- match(1:Rk, piv)
    if (length(x.names)) 
        names(newpos) <- x.names
    for (j in names(asgn)) {
        aj <- asgn[[j]]
        aj <- aj[ok <- (nj <- newpos[aj]) <= Rk]
        if (length(aj)) {
            asgn[[j]] <- aj
            pasgn[[j]] <- nj[ok]
        }
        else asgn[[j]] <- pasgn[[j]] <- NULL
    }
    effects <- X * matrix(fit$coefficients, byrow = TRUE, nrow = nrow(X), 
        ncol = ncol(X))
    fit$effects <- sqrt(colSums(effects^2))
    fit$assign <- asgn
    if (model) 
        fit$model <- m
    if (x) 
        fit$x <- X
    if (y) 
        fit$y <- Y
    attr(fit, "na.message") <- attr(m, "na.message")
    if (!is.null(attr(m, "na.action"))) 
        fit$na.action <- attr(m, "na.action")
    fit
}