[R] tapply bug? - levels of a factor in a data frame after tapply are intermixed

Dimitri Liakhovitski ld7631 at gmail.com
Fri Feb 13 18:38:20 CET 2009


On Fri, Feb 13, 2009 at 12:24 PM, Marc Schwartz
<marc_schwartz at comcast.net> wrote:
> on 02/13/2009 11:09 AM Dimitri Liakhovitski wrote:
>> Hello! I have encountered a really weird problem. Maybe you've
>> encountered it before?
>> I have a large data frame "importances". It has one factor ($A) with 3
>> levels: 3, 9, and 15. $B is a regular numeric variable.
>> Below I am picking a really small sub-frame (just 3 rows) based on
>> "indices". "indices" were chosen so that all 3 levels of A are
>> present:
>>
>> indices=c(14329,14209,14353)
>> test=data.frame(yy=importances[["B']][indices],xx=importances[["A"]][indices])
>> Here is what the new data frame "test" looks like:
>>
>>             yy        xx
>> 1 -0.009984006  9
>> 2 -2.339904131  3
>> 3 -0.008427385 15
>>
>> Here is the structure of "test":
>>> str(test)
>> 'data.frame':   3 obs. of  2 variables:
>>  $ yy: num  -0.00998 -2.3399 -0.00843
>>  $ xx: Factor w/ 3 levels "3","9","15": 2 1 3
>>
>> Notice - the order of factor levels for xx is not 1 2 3 as it should
>> be but 2 1 3. How come?
>>
>> Or also look at this:
>>> test$xx
>> [1] 9  3  15
>> Levels: 3 9 15
>>
>> Same thing.
>> Do you know what might be the reason?
>>
>> Thank you very much!
>
> The output of str() is showing you the factor levels of test$xx,
> followed by the internal integer codes for the three actual values of
> test$xx, 9, 3, and 15:
>
>> str(test$xx)
>  Factor w/ 3 levels "3","9","15": 2 1 3
>
>> levels(test$xx)
> [1] "3"  "9"  "15"
>
>> as.integer(test$xx)
> [1] 2 1 3
>
> 9 is the second level, hence the 2
> 3 is the first level, hence the 1
> 15 is the third level, hence the 3.
>
> No problems, just clarification needed on what you are seeing.
>
> Note that you do not reference anything above regarding tapply() as per
> your subject line, though I suspect that I have an idea as to why you did...
>
> HTH,
>
> Marc Schwartz
>
>

Marc (and everyone), I expected it to show:
$ xx: Factor w/ 3 levels "3","9","15":  1 2 3
rather than what I am seeing:
$ xx: Factor w/ 3 levels "3","9","15":  2 1 3
Because 3 is level 1, 9 is level 2 and 15 is level 3.
I have several other factors in my original data frame. And I've done
that tapply for all of them (for the same dependent variable) - and in
all of them the first level was 1, the second 2, etc.
Why I am concerned about the problem? Because I am plotting the means
of the numeric variable against the levels of the factor and it's
important to me that the factor levels are correct (in the right
order)...
Dimitri


-- 
Dimitri Liakhovitski
MarketTools, Inc.
Dimitri.Liakhovitski at markettools.com




More information about the R-help mailing list