[Rd] bug and enhancement to split?

Martin Morgan mtmorgan at fhcrc.org
Sun Jan 27 20:20:37 CET 2013


 > R.version.string
[1] "R Under development (unstable) (2013-01-26 r61752)"

'split.default' recycles a short factor for unclassed 'x', but not for an 
instance of x that is a class

 > split(1:5, 1:2)
[1] 1 3 5

[1] 2 4

Warning message:
In split.default(1:5, 1:2) :
   data length is not a multiple of split variable
 > x = structure(1:5, class="A")
 > split(x, 1:2)
[1] 1

[1] 2

Also, this is inconsistent with split<-, which does have recycling

 > split(x, 1:2) <- 1:2
Warning message:
In split.default(seq_along(x), f, drop = drop, ...) :
   data length is not a multiple of split variable
 > x
[1] 1 2 1 2 1
[1] "A"

A solution is to change a call to seq_along(f) toward the end of split.default 
to seq_along(x).

@@ -32,7 +32,7 @@
      lf <- levels(f)
      y <- vector("list", length(lf))
      names(y) <- lf
-    ind <- .Internal(split(seq_along(f), f))
+    ind <- .Internal(split(seq_along(x), f))
      for(k in lf) y[[k]] <- x[ind[[k]]]

Maybe a little harder to argue the following, but in split.default, for a class 
that one might wish to develop factor-like behaviour, e.g.,

   Rle = setClass("Rle", representation(values="integer", lengths="integer"))
   f = Rle(values=1:2, lengths=2:3)

the code

     if (is.list(f))
         f <- interaction(f, drop = drop, sep = sep)
     else if (drop || !is.factor(f))
         f <- factor(f)

requires that one make factor a generic and develop a method for factor.Rle. 
This contradicts the documentation

        f: a ‘factor’ in the sense that ‘as.factor(f)’ defines the
           grouping, or a list of such factors in which case their
           interaction is used for the grouping.

and perhaps the more common (?) pattern of coercion using as.*. A solution is to 
make as.factor a generic and revises the code above to use something like

      if (is.list(f)) f <- interaction(f, drop = drop, sep = sep)
      else if (!is.factor(f)) f <- as.factor(f)
      else if (drop) f <- factor(f)

One then gets split behaviour if there is an as.factor.Rle method

     as.factor.Rle <- function(x, ...)
         factor(rep(x at values, x at lengths), levels=unique(x at values))
     setAs("Rle", "factor", function(from) as.factor.Rle(from))

These more elaborate changes are in the attached diff.

Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793
-------------- next part --------------
A non-text attachment was scrubbed...
Name: split.diff.tar.gz
Type: application/x-gzip
Size: 1184 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20130127/8541e1ea/attachment.gz>

More information about the R-devel mailing list