[BioC] Did the behavior of as.vector(Rle(some.factor)) change on purpose?
Martin Morgan
mtmorgan at fhcrc.org
Tue Aug 31 17:37:29 CEST 2010
On 08/31/2010 07:15 AM, Steve Lianoglou wrote:
> Hi all,
>
> It looks as if the as.vector call to a run length encoded factor turns
> it to a vector of characters.
>
> Did this happen on accident, or was it a deliberate design decision?
Bug fix
> x = factor(letters)
> as.vector(x)
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
"r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
> as.factor(x)
[1] a b c d e f g h i j k l m n o p q r s t u v w x y z
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
So
> Rle(factor(letters))
'factor' Rle of length 26 with 26 runs
Lengths: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Values : a b c d e f g h i j k l m n o p q r s t u v w x y z
Levels(26): a b c d e f g h i j k l m n o p q r s t u v w x y z
> as.vector(Rle(factor(letters)))
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
"r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
> as.factor(Rle(factor(letters)))
[1] a b c d e f g h i j k l m n o p q r s t u v w x y z
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
There might be edge cases where our own code has not caught up with the
fix; please let us know...
> packageDescription('IRanges')$Version
[1] "1.7.32"
Martin
>
> Previously:
>
> R-2.12, IRanges_1.7.19, GenomicRanges_1.1.20
> (A factor of length one is returned):
>
> R> a <- Rle(strand(c('+', '-', '+', '+', '-')))
> R> as.vector(a[1])
> [1] +
> Levels: + - *
>
> =============================
>
> Now:
> R-2.12, IRanges_1.7.31, GenomicRanges_1.1.20 (The factor is converted
> to a character)
>
> R> a <- Rle(strand(c('+', '-', '+', '+', '-')))
> R> as.vector(a[1])
> [1] "+"
>
> It seems like it would do what is expected (by me :-) if the
> `getMethod('as.vector', c("Rle", "missing"))` was changed from:
>
> function (x, mode = "any")
> rep.int(as.vector(runValue(x)), runLength(x))
>
> To:
>
> function (x, mode = "any")
> rep.int(runValue(x), runLength(x))
>
> but, upon further inspection, it seems like this was how it was
> defined previously anyway, so ... I guess something motivated this
> change?
>
> The complete sessionInfo for my last (buggy(?)) case is:
>
> R version 2.12.0 Under development (unstable) (2010-07-07 r52477)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=C
> [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
> LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] GenomicRanges_1.1.20 IRanges_1.7.31
>
> loaded via a namespace (and not attached):
> [1] tools_2.12.0
>
> Thanks,
> -steve
>
>
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list