[RsR] Several bugs for th lmRob function
Olivier Renaud
Olivier.Renaud at unige.ch
Tue Dec 8 13:56:58 CET 2009
Dear all,
Today, I send two mails related to robust ANOVA in R, but since the
scope and target are different, it is better to separate the subjects.
In the next mail, there is a praise to improve the accessibility of
robust methods for ANOVA. As far as I can understand, roust ANOVA cannot
be run with lmrob from the robustbase library, see the other mail. In
this mail, I list several bugs in lmRob from the robust library.
* There is a bug in the function "lmRob": when formulas with
interaction between factors are given, only the main effects are
considered as factors, but the interaction is considered as a
continuous variable. This is problematic, since the initial
estimator is very likely give an error because of the discrete
type of the interaction. Example:
> summary(lmRob (formula = Resp~Origine*Sexe, data = POVm3))
Error in psi.weight(wi[wcnd], ipsi, xk) :
NA/NaN/Inf in foreign function call (arg 2)
In addition: Warning message:
In lmRob.fit.compute(x2, y, x1 = x1, x1.idx = x1.idx, nrep = nrep, :
Initial scale less than tl = 1e-06 .
Attached is a fix to the function lmRob, see the two lines with "###
added". I'm not a guru programmer so you might have to check the
code. With this corrected function, the above call works.
* It is not a bug, but there are more initial algorithms than
initial.alg in lmRob.control let the user choose. More
importantly, depending on the type of covariates, the user's
choice can be silently overridden (in the function
lmRob.fit.compute). I do not criticize this, but I suggest that
(a) a message is displayed to inform the user of this modification
and (b) that the initial algorithm that is finally used be written
the lmRob object. For the moment, the user has no way to know
which initial algorithm is used.
* There is a bug in the function "anova.lmRob": in the presence of
missing values in the covariates, the models that are compared do
not contain the same number of observations, which drives the
inference completely wrong. Again, I am not a specialist, but I
think it comes from the line
curobj <- update(object, curfrm)
which uses the original data frame instead of using only the
observations with no missing in the FULL model (which are given in
the lm object obj$model). When smaller models are compared,
typically some observations that were not used in the full model are
used. I'm not sure how to fix this bug.
* Maybe related to this missing value problem, I have found an F
value equal to -133, although the corresponding t-value in summary
was close to zero, but not suspect.
> anova(lmRob(ttf~income*sexe,data=cig,na.action=na.omit))
Terms added sequentially (first to last)
Chisq Df RobustF Pr(F)
(Intercept) 1
income 1 69.576 < 2.2e-16 ***
sexe 1 14.989 7.97e-05 ***
income:sexe 1 -133.293 1
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Warning messages:
1: In chi.weight(res/Scale, ipsi, yc) - chi.weight(Res/Scale, ipsi, :
longer object length is not a multiple of shorter object length
2: In chi.weight(res/Scale, ipsi, yc) - chi.weight(Res/Scale, ipsi, :
longer object length is not a multiple of shorter object length
* Some prominent members of the S/R community will not consider this
as a bug, but in a companion mail, I give arguments to include
so-called "Type III sums of squares" or effect tested marginally.
In the context of unbalanced ANOVA, there are other prominent
members of the statistics community that give extremely convincing
arguments, see other mail. I know it can be done by hand, but for
an average user, having it as an optional argument to anova.lmRob
would be an important argument to use R and robust ANOVA.
Cheers,
Olivier
--
!!! New e-mail, please update your address book !!!
Olivier.Renaud at unige.ch http://www.unige.ch/fapse/mad/
Methodology & Data Analysis - Psychology Dept - University of Geneva
UniMail, Office 4164 - 40, Bd du Pont d'Arve - CH-1211 Geneva 4
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/r-sig-robust/attachments/20091208/044d577f/attachment.html>
-------------- next part --------------
lmRob =
function (formula, data, weights, subset, na.action, model = TRUE,
x = FALSE, y = FALSE, contrasts = NULL, nrep = NULL, control = lmRob.control(...),
genetic.control = NULL, ...)
{
the.call <- match.call()
m <- match.call(expand = FALSE)
m$model <- m$x <- m$y <- m$contrasts <- m$nrep <- NULL
m$control <- m$genetic.control <- m$... <- NULL
m[[1]] <- as.name("model.frame")
m <- eval(m, sys.parent())
Terms <- attr(m, "terms")
weights <- model.extract(m, weights)
Y <- model.extract(m, "response")
X <- model.matrix(Terms, m, contrasts)
if (nrow(X) <= ncol(X))
stop("Robust method is inappropriate: not enough observations.")
asgn <- attr(X, "assign")
if (!is.list(asgn))
asgn <- splus.assign(asgn, attr(Terms, "term.labels"))
factor.names <- names(m)[sapply(m, is.factor)] ### Forgets that interaction can also be factors !
at = attr(Terms,"factors") ### added
if (length(factor.names)>0)
factor.names <- names(apply(at[ setdiff(dimnames(at)[[1]], factor.names),, drop=FALSE ], 2, sum)==0) ### added
x1.idx <- unname(unlist(asgn[factor.names]))
X1 <- X[, x1.idx, drop = FALSE]
X2 <- X[, setdiff(1:(dim(X)[2]), x1.idx), drop = FALSE]
p2 <- ncol(X2)
cnam2 <- dimnames(X2)[[2]]
if (dim(X1)[2] > 0 && p2 > 1 && cnam2[1] == "(Intercept)") {
X1 <- cbind(X2[, 1], X1)
dimnames(X1)[[2]][1] <- "(Intercept)"
X2 <- X2[, -1, drop = FALSE]
x1.idx <- c(1L, x1.idx)
}
else if (p2 == 1 && cnam2 == "(Intercept)") {
X1 <- X
x1.idx <- 1:ncol(X)
X2 <- NULL
}
if (length(dim(X1)) && dim(X1)[2] == 0) {
X1 <- NULL
x1.idx <- NULL
}
if (length(dim(X2)) && dim(X2)[2] == 0)
X2 <- NULL
print(x1.idx)
print(X2)
fit <- if (length(weights))
lmRob.wfit(X, Y, weights, x1 = X1, x2 = X2, x1.idx = x1.idx,
nrep = nrep, robust.control = control, genetic.control = genetic.control)
else lmRob.fit(X, Y, x1 = X1, x2 = X2, x1.idx = x1.idx, nrep = nrep,
robust.control = control, genetic.control = genetic.control)
if (is.null(fit))
return(NULL)
fit$terms <- Terms
fit$call <- the.call
x.names <- dimnames(X)[[2]]
pasgn <- asgn
qrx <- qr(X)
Rk <- qrx[["rank"]]
piv <- qrx[["pivot"]][1:Rk]
newpos <- match(1:Rk, piv)
if (length(x.names))
names(newpos) <- x.names
for (j in names(asgn)) {
aj <- asgn[[j]]
aj <- aj[ok <- (nj <- newpos[aj]) <= Rk]
if (length(aj)) {
asgn[[j]] <- aj
pasgn[[j]] <- nj[ok]
}
else asgn[[j]] <- pasgn[[j]] <- NULL
}
effects <- X * matrix(fit$coefficients, byrow = TRUE, nrow = nrow(X),
ncol = ncol(X))
fit$effects <- sqrt(colSums(effects^2))
fit$assign <- asgn
if (model)
fit$model <- m
if (x)
fit$x <- X
if (y)
fit$y <- Y
attr(fit, "na.message") <- attr(m, "na.message")
if (!is.null(attr(m, "na.action")))
fit$na.action <- attr(m, "na.action")
fit
}
More information about the R-SIG-Robust
mailing list