[Rd] Two apparent bugs in aov(y~ *** -1 + Error(***)),
with (PR#6520)
ripley at stats.ox.ac.uk
ripley at stats.ox.ac.uk
Mon Feb 2 10:29:28 MET 2004
I believe you are right, but can you please explain why anyone would want
to fit this model? It differs only in the coding from
aov(y ~ a + b + Error(c), data=test.df)
and merely lumps together the top two strata.
There is a much simpler fix: in the line
if(intercept) nmstrata <- c("(Intercept)", nmstrata)
remove the condition (and drop the empty stratum later if you want).
On Fri, 30 Jan 2004 b-h at mevik.net wrote:
> I think there are two bugs in aov() that shows up when the right hand
> side of `formula' contains both `-1' and an Error() term, e.g.,
> aov(y ~ a + b - 1 + Error(c), ...). Without `-1' or `Error()' there
> is no problem. I've included and example, and the source of aov()
> with suggested fixes below.
>
> The first bug (labeled BUG 1 below) creates an extra, empty stratum
> inside `Within' in the result list (se example below).
>
> The second bug (BUG 2) assigns the fit of each stratum at one index
> too high in the result list; i.e., the fit of stratum 1 is assigned to
> `Stratum 2' etc. `Stratum 1' is NULL. (Also the extra stratum added
> by BUG 1, is NULL.)
>
> ***
> *** VERSION INFORMATION
> ***
>
> > version
> _
> platform i386-pc-linux-gnu
> arch i386
> os linux-gnu
> system i386, linux-gnu
> status
> major 1
> minor 8.1
> year 2003
> month 11
> day 21
> language R
>
>
> ***
> *** EXAMPLE
> ***
>
> > test.df <- data.frame (y=rnorm(8), a=gl(2,1,8), b=gl(2,3,8), c=gl(2,4,8))
> > ## With intercept; no problem:
> > aov(y~a+b+Error(c), data=test.df)
>
> Call:
> aov(formula = y ~ a + b + Error(c), data = test.df)
>
> Grand Mean: -0.4105379
>
> Stratum 1: c
>
> Terms:
> b
> Sum of Squares 0.3602348
> Deg. of Freedom 1
>
> Estimated effects are balanced
>
> Stratum 2: Within
>
> Terms:
> a b Residuals
> Sum of Squares 0.0142722 2.7349479 1.7883294
> Deg. of Freedom 1 1 4
>
> Residual standard error: 0.6686422
> Estimated effects may be unbalanced
> > ## Without intercept; incorrect strata assignments:
> > aov(y~a+b-1+Error(c), data=test.df)
>
> Call:
> aov(formula = y ~ a + b - 1 + Error(c), data = test.df)
>
> Stratum 1: c
> NULL
>
> Stratum 2: Within
>
> Terms:
> a b
> Sum of Squares 1.3483306 0.3602348
> Deg. of Freedom 1 1
>
> 1 out of 3 effects not estimable
> Estimated effects may be unbalanced
>
> Stratum 3: NA
> NULL
> > ## Fixed version:
> > fixaov(y~a+b-1+Error(c), data=test.df)
>
> Call:
> fixaov(formula = y ~ a + b - 1 + Error(c), data = test.df)
>
> Stratum 1: c
>
> Terms:
> a b
> Sum of Squares 1.3483306 0.3602348
> Deg. of Freedom 1 1
>
> 1 out of 3 effects not estimable
> Estimated effects may be unbalanced
>
> Stratum 2: Within
>
> Terms:
> a b Residuals
> Sum of Squares 0.0142722 2.7349479 1.7883294
> Deg. of Freedom 1 1 4
>
> Residual standard error: 0.6686422
> 1 out of 3 effects not estimable
> Estimated effects may be unbalanced
> >
>
>
> ***
> *** aov() code with suggested fixes:
> ***
>
> fixaov <- function(formula, data = NULL, projections = FALSE, qr = TRUE,
> contrasts = NULL, ...)
> {
> Terms <- if(missing(data)) terms(formula, "Error")
> else terms(formula, "Error", data = data)
> indError <- attr(Terms, "specials")$Error
> if(length(indError) > 1)
> stop(paste("There are", length(indError),
> "Error terms: only 1 is allowed"))
> lmcall <- Call <- match.call()
> lmcall[[1]] <- as.name("lm")
> lmcall$singular.ok <- TRUE # not currently used in R
> if(projections) qr <- lmcall$qr <- TRUE
> lmcall$projections <- NULL
> if(is.null(indError)) {
> ## no Error term
> fit <- eval(lmcall, parent.frame())
> if(projections) fit$projections <- proj(fit)
> class(fit) <- if(inherits(fit, "mlm"))
> c("maov", "aov", oldClass(fit)) else c("aov", oldClass(fit))
> fit$call <- Call
> return(fit)
> } else {
> ## helmert contrasts can be helpful: do we want to force them?
> ## this version does for the Error model.
> opcons <- options("contrasts")
> options(contrasts=c("contr.helmert", "contr.poly"))
> on.exit(options(opcons))
> allTerms <- Terms
> errorterm <- attr(Terms, "variables")[[1 + indError]]
> eTerm <- deparse(errorterm[[2]], width = 500, backtick = TRUE)
> intercept <- attr(Terms, "intercept")
> ecall <- lmcall
> ecall$formula <-
> as.formula(paste(deparse(formula[[2]], width = 500,
> backtick = TRUE), "~", eTerm,
> if(!intercept) "- 1"),
> env=environment(formula))
>
> ecall$method <- "qr"
> ecall$qr <- TRUE
> ecall$contrasts <- NULL
> er.fit <- eval(ecall, parent.frame())
> options(opcons)
> nmstrata <- attr(terms(er.fit), "term.labels")
> ## remove backticks from simple labels for strata (only)
> nmstrata <- sub("^`(.*)`$", "\\1", nmstrata)
> if(intercept) nmstrata <- c("(Intercept)", nmstrata)
> qr.e <- er.fit$qr
> rank.e <- er.fit$rank
> qty <- er.fit$resid
> maov <- is.matrix(qty)
> asgn.e <- er.fit$assign[qr.e$piv[1:rank.e]]
> ## we want this to label the rows of qtx, not cols of x.
> nobs <- NROW(qty)
> if(nobs > rank.e) {
> ## APPARENT BUG 1:
> # result <- vector("list", max(asgn.e) + 2)
> # asgn.e[(rank.e+1):nobs] <- max(asgn.e) + 1
> ## SUGGESTED FIX:
> asgn.e[(rank.e+1):nobs] <- max(asgn.e) + 1
> result <- vector("list", length (unique(asgn.e)))
> ## END FIX
> nmstrata <- c(nmstrata, "Within")
> } else result <- vector("list", max(asgn.e) + 1)
> names(result) <- nmstrata
> lmcall$formula <- form <-
> update(formula, paste(". ~ .-", deparse(errorterm, width = 500,
> backtick = TRUE)))
> Terms <- terms(form)
> lmcall$method <- "model.frame"
> mf <- eval(lmcall, parent.frame())
> xvars <- as.character(attr(Terms, "variables"))[-1]
> if ((yvar <- attr(Terms, "response")) > 0)
> xvars <- xvars[-yvar]
> if (length(xvars) > 0) {
> xlev <- lapply(mf[xvars], levels)
> xlev <- xlev[!sapply(xlev, is.null)]
> } else xlev <- NULL
> resp <- model.response(mf)
> qtx <- model.matrix(Terms, mf, contrasts)
> cons <- attr(qtx, "contrasts")
> dnx <- colnames(qtx)
> asgn.t <- attr(qtx, "assign")
> if(length(wts <- model.weights(mf))) {
> wts <- sqrt(wts)
> resp <- resp * wts
> qtx <- qtx * wts
> }
> qty <- as.matrix(qr.qty(qr.e, resp))
> if((nc <- ncol(qty)) > 1) {
> dny <- colnames(resp)
> if(is.null(dny)) dny <- paste("Y", 1:nc, sep="")
> dimnames(qty) <- list(seq(nrow(qty)), dny)
> } else dimnames(qty) <- list(seq(nrow(qty)), NULL)
> qtx <- qr.qty(qr.e, qtx)
> dimnames(qtx) <- list(seq(nrow(qtx)) , dnx)
> for(i in seq(along=nmstrata)) {
> ## APPARENT BUG 2:
> # select <- asgn.e==(i-1)
> ## SUGGESTED FIX:
> select <- asgn.e==unique(asgn.e)[i]
> ## END FIX
> ni <- sum(select)
> if(!ni) next
> ## helpful to drop constant columns.
> xi <- qtx[select, , drop = FALSE]
> cols <- colSums(xi^2) > 1e-5
> if(any(cols)) {
> xi <- xi[, cols, drop = FALSE]
> attr(xi, "assign") <- asgn.t[cols]
> fiti <- lm.fit(xi, qty[select,,drop=FALSE])
> fiti$terms <- Terms
> } else {
> y <- qty[select,,drop=FALSE]
> fiti <- list(coefficients = numeric(0), residuals = y,
> fitted.values = 0 * y, weights = wts, rank = 0,
> df.residual = NROW(y))
> }
> if(projections) fiti$projections <- proj(fiti)
> class(fiti) <- c(if(maov) "maov", "aov", oldClass(er.fit))
> result[[i]] <- fiti
> }
> class(result) <- c("aovlist", "listof")
> if(qr) attr(result, "error.qr") <- qr.e
> attr(result, "call") <- Call
> if(length(wts)) attr(result, "weights") <- wts
> attr(result, "terms") <- allTerms
> attr(result, "contrasts") <- cons
> attr(result, "xlevels") <- xlev
> result
> }
> }
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list