[R] Is this an artifact of using "which"?

Peter Dalgaard P.Dalgaard at biostat.ku.dk
Mon Apr 14 15:03:15 CEST 2008


Richard.Cotton at hsl.gov.uk wrote:
>> I used "which" to obtain a subset of values from my data.frame. 
>> however, I find that there is a "trace" of the values I  have removed. 
>> Any suggestions would be greatly appreciate.
>>
>> Below is my data:
>>
>> d <- data.frame( val   = 1:10,
>>                  group = sample(LETTERS[1:5], 10, repl=TRUE) )
>>
>>  >d
>>     val group
>> 1    1     B
>> 2    2     E
>> 3    3     B
>> 4    4     C
>> 5    5     A
>> 6    6     B
>> 7    7     A
>> 8    8     E
>> 9    9     E
>> 10  10     A
>>
>> ## selecting everything that is not group "A"
>>   d<-d[which(d$group !="A"),]
>>
>>  > d
>>    val group
>> 1   1     B
>> 2   2     E
>> 3   3     B
>> 4   4     C
>> 6   6     B
>> 8   8     E
>> 9   9     E
>>
>>  > levels(d$group)
>> [1] "A" "B" "C" "E"
>>     
>
> The (imho) unintuitive behaviour is to do with the subsetting function 
> [.factor, not which.  There are a couple of workarounds:
>   
In that case, your intuition needs readjustment....

There are other systems which (de facto) drop unused levels by default,
and it is a real pain to work around, especially for subgroup analyses.
E.g. there is no way to get PROC FREQ in SAS to include a count of zero,
and barplots of ratings fro 0 to 10 lose columns "randomly" in SPSS
(this _can_ be worked around, though).

Anyways, it is illogical: There's no reason that a tabulation of gender
distribution for (say) tenured CS professors should suddenly pretend
that the female gender does not exist!

> 1. Call factor to recreate the levels, and get rid of "A"
> factor(d$group)
>
> 2. Redefine [.factor; see dropUnusedLevels in the Hmisc package.
>
> Regards,
> Richie.
>
> Mathematical Sciences Unit
> HSL
>
>
> ------------------------------------------------------------------------
> ATTENTION:
>
> This message contains privileged and confidential info...{{dropped:20}}



More information about the R-help mailing list