[Rd] Take marginality into account in factor.scope

ONKELINX, Thierry Thierry.ONKELINX at inbo.be
Mon Apr 28 15:40:03 CEST 2008


Dear devellopeRs,

I've altered factor.scope() from the stats package so it can take
marginality into account. Therefore I've added the argument marginal.
When marginal is NULL, factor.scope() works like the original version.
The marginality is to be specified as a named list. The name of the
elements is the name of a variable. The content of the element is a
character vector with the variable(s) that must be in the model.

E.g. marginal = list(D = c("B", "C"), A3 = "A2", A2 = "A1")
Means that
- D can only enter the model if B and C are present.
- A2 can only enter the model if A1 is present.
- A3 can only enter the model if A2 is present. Since the list contains
A2 = "A1", this implies that A1 needs to be present too.

- B can only leave the model if D is not present.
- C can only leave the model if D is not present.
- A2 can only leave the model if A3 is not present.
- A1 can only leave the model if A2 and A3 are not present.

It copes with interactions too.

A1:B needs 
	by default
		A1 and B
	due to marginality
		none

A2:B needs
	by default
		A2 and B
	due marginality
		A1 and A1:B

Comments are welcome,

Cheers,

Thierry



factor.scope <- function (factor, scope, marginal = NULL) {
    drop <- scope$drop
    add <- scope$add
    if (length(factor) && !is.null(drop)) {
        nmdrop <- colnames(drop)
        facs <- factor
        if (length(drop)) {
            nmfac <- colnames(factor)
            nmfac0 <- sapply(strsplit(nmfac, ":", fixed = TRUE),
function(x) paste(sort(x), collapse = ":"))
            nmdrop0 <- sapply(strsplit(nmdrop, ":", fixed = TRUE),
function(x) paste(sort(x), collapse = ":"))
            where <- match(nmdrop0, nmfac0, 0)
            if (any(!where)){
                stop(gettextf("lower scope has term(s) %s not included
in model",  paste(sQuote(nmdrop[where == 0]), collapse = ", ")), domain
= NA)
            }
            facs <- factor[, -where, drop = FALSE]
            nmdrop <- nmfac[-where]
        } else { 
            nmdrop <- colnames(factor)
        }
        if (ncol(facs) > 1) {
            keep <- rep.int(TRUE, ncol(facs))
            f <- crossprod(facs > 0)
            for (i in seq(keep)){
                keep[i] <- max(f[i, -i]) != f[i, i]
            }
            nmdrop <- nmdrop[keep]
        }
        keep <- unlist(lapply(strsplit(nmdrop, ":", fixed = TRUE),
function(x){
            whereMarginal <- match(x, names(marginal))
            if(!all(is.na(whereMarginal))){
                if(length(x) == 1){
                    marginal[[whereMarginal]]
                } else {
                    lowerOrderTerms <- lapply(seq_along(whereMarginal),
function(z){
                        if(is.na(whereMarginal[z])){
                           x[z]
                        } else {
                            c(x[z], marginal[[whereMarginal[z]]])
                        }
                    })
                    lowerOrder <- apply(expand.grid(lowerOrderTerms), 1,
function(z){
                        paste(sort(z), collapse = ":")
                    })
                    lowerOrder[!lowerOrder %in% paste(sort(x), collapse
= ":")]
                }
            }
        }))
        nmdrop <- nmdrop[!nmdrop %in% keep]
    } else {
        nmdrop <- character(0)
    }
    if (!length(add)){
        nmadd <- character(0)
    } else {
        nmfac <- colnames(factor)
        nmadd <- colnames(add)
        if (!is.null(nmfac)) {
            nmfac0 <- sapply(strsplit(nmfac, ":", fixed = TRUE),
function(x) paste(sort(x), collapse = ":"))
            nmadd0 <- sapply(strsplit(nmadd, ":", fixed = TRUE),
function(x) paste(sort(x), collapse = ":"))
            where <- match(nmfac0, nmadd0, 0)
            if (any(!where)){
                stop(gettextf("upper scope does not include model
term(s) %s", paste(sQuote(nmfac[where == 0]), collapse = ", ")), domain
= NA)
            }
            nmadd <- nmadd[-where]
            add <- add[, -where, drop = FALSE]
        }
        if (ncol(add) > 1) {
            keep <- rep.int(TRUE, ncol(add))
            f <- crossprod(add > 0)
            for (i in seq(keep)){
                keep[-i] <- keep[-i] & (f[i, -i] < f[i, i])
            }
            nmadd <- nmadd[keep]
        }
        if(length(nmadd)){
            keep <- sapply(strsplit(nmadd, ":", fixed = TRUE),
function(x){
                whereMarginal <- match(x, names(marginal))
                if(!all(is.na(whereMarginal))){
                    if(length(x) == 1){
                        all(!marginal[[whereMarginal]] %in% nmadd)
                    } else {
                        lowerOrderTerms <-
lapply(seq_along(whereMarginal), function(z){
                            if(is.na(whereMarginal[z])){
                               x[z]
                            } else {
                                c(x[z], marginal[[whereMarginal[z]]])
                            }
                        })
                        lowerOrder <-
apply(expand.grid(lowerOrderTerms), 1, function(z){
                            paste(sort(z), collapse = ":")
                        })
                        lowerOrder <- lowerOrder[!lowerOrder %in%
paste(sort(x), collapse = ":")]
                        all(!lowerOrder %in% nmadd)
                    }
                } else {
                    TRUE
                }
            })
            nmadd <- nmadd[keep]
        }
    }
    list(drop = nmdrop, add = nmadd)
}

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey



More information about the R-devel mailing list