[R] confusion on levels() function, and how to assign a wanted order to factor levels, intentionally?

Peter Dalgaard P.Dalgaard at biostat.ku.dk
Tue Jun 16 11:42:48 CEST 2009


Mark Difford wrote:
> Hi Mao,
> 
>>> I am confused. And, I want to know how to assign a wanted order to factor 
>>> levels, intentionally?
> 
> You want ?relevel. Although the documentation leads one to think that it can
> only be used to set a reference level, with the other levels being moved
> down, presently it can in fact be used to set any order you wish. For a
> factor with just a few levels you could simply use an index into the default
> order.
> 
> ##
> new_d <- d
> c(5,1,6:10,2:4)
> new_d$population <- relevel(d$population,
> levels(d$population)[c(5,1,6:10,2:4)])
> 
> Ignore the warning. Note that relevel can also be used "on-the-fly," so
> without permanently changing level-order.

Now that's a dangerous strategy! You're relying on undocumented
behaviour and ignoring a warning message to boot. If someone implements
a check that ref is a scalar as assumed, you're shot.

Better to have a look at why stats:::relevel.factor currently works and
use the same mechanism:

    lev <- levels(x)
    if (is.character(ref))
        ref <- match(ref, lev)
    if (is.na(ref))
        stop("'ref' must be an existing level")
    nlev <- length(lev)
    if (ref < 1 || ref > nlev)
        stop(gettextf("ref = %d must be in 1:%d", ref, nlev),
            domain = NA)
    factor(x, levels = lev[c(ref, seq_along(lev)[-ref])])

and if you assume an integer reordering in ref, this reduces to

    lev <- levels(x)
    factor(x, levels = lev[ref])

and if ref is a character vector, plain

    factor(x, levels=ref)

should do.

(Or, you can go "full monty" and retain all the checks an balances, just
cure the warning using

if (any(is.na(ref))
        stop("'ref' must contain existing levels")
...
if (any(ref < 1 | ref > nlev))

Maybe also check !any(duplicated(ref)) for good measure
)

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907




More information about the R-help mailing list