[R] Bug in levels() function?
Thomas Lumley
tlumley at u.washington.edu
Mon Jan 28 20:03:51 CET 2008
This is not a bug; it is deliberately designed this way.
There are circumstances when you want to drop levels on subsetting and
other circumstances where you don't, so the default behaviour can't make
everyone happy. However, there is an option to get the behaviour you want
> x<-as.factor(LETTERS)
> levels(x[1])
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
"R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"
> levels(x[1,drop=TRUE])
[1] "A"
On Mon, 28 Jan 2008, Groot, Philip de wrote:
> Hello all,
>
> I am not sure whether it actually is a bug, but it is not the behaviour I would expect. Please consider this:
>
>> Sibships
> [1] Patient_2400 Patient_2400 Patient_345 Patient_345 Patient_8901
> [6] Patient_8901 Patient_4008 Patient_4008 Patient_7991 Patient_7991
> [11] Patient_8353 Patient_8353 Patient_1212 Patient_1212 Patient_2168
> [16] Patient_2168 Patient_2760 Patient_2760 Patient_4726 Patient_4726
> [21] Patient_6699 Patient_6699 Patient_7641 Patient_7641 Patient_8263
> [26] Patient_8263 Patient_1389 Patient_1389 Patient_1618 Patient_1618
> [31] Patient_2410 Patient_2410 Patient_2612 Patient_2612 Patient_2721
> [36] Patient_2721 Patient_5053 Patient_5053 Patient_8458 Patient_8458
> [41] Patient_211 Patient_211 Patient_9004 Patient_9004 Patient_3423
> [46] Patient_3423 Patient_7413 Patient_7413 Patient_7815 Patient_7815
> [51] Patient_9232 Patient_9232 Patient_2267 Patient_2267 Patient_468
> [56] Patient_468
> 28 Levels: Patient_1212 Patient_1389 Patient_1618 Patient_211 ... Patient_9232
>
>> Comparison_Indices
> [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
> [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>
>> Sibships[Comparison_Indices]
> [1] Patient_2400 Patient_2400 Patient_345 Patient_345 Patient_8901
> [6] Patient_8901 Patient_7413 Patient_7413
> 28 Levels: Patient_1212 Patient_1389 Patient_1618 Patient_211 ... Patient_9232
>
> The problem with this last command is that I would expect 4 levels (because only 8 "Comparison_Indices" are true, which is equal to 4 sibships. So: levels() does not take array indices into account or stated otherwise: if you use a subset in an array (vector), the levels() are not properly updated (to my opinion).
>
> What I additionally found is the following:
>> small_test <- factor(x=c("a", "b", "c"))
>> typeof(small_test)
> [1] "integer"
>
> The same happens to the Sibships that I defined as a factor? Why is it of type integer?
>
> This is the version() output:
>> version
> _
> platform x86_64-unknown-linux-gnu
> arch x86_64
> os linux-gnu
> system x86_64, linux-gnu
> status
> major 2
> minor 6.1
> year 2007
> month 11
> day 26
> svn rev 43537
> language R
> version.string R version 2.6.1 (2007-11-26)
>>
>
> So: should I submit a Bug report?
>
> Regards,
>
> Dr. Philip de Groot
> Wageningen University
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list