[R] factor : how does it work ?
Duncan Murdoch
murdoch at stats.uwo.ca
Thu Oct 6 16:32:37 CEST 2005
On 10/6/2005 10:20 AM, Florence Combes wrote:
>> > > 2d I can't manage to deal with factors, so when I have some, I
>> transform
>> > > them in vectors (with levels()), but I think I miss the power and
>> utility
>> > of
>> > > the factor type ?
>> >
>> > levels() is not the conversion you want.
>
>
> in fact I use
> 'as.numeric(levels(f))[f]'
> (from the ?factor description)
That will only work if the levels have names that can be converted to
numbers. In the example below, the levels are "a" and "b", so you'll
get NA values if you try this.
>
> That lists all the levels, but
>> > it doesn't tell you how they correspond to individual observations. For
>> > example,
>> >
>> > > df <- data.frame(x=1:3, y=c('a','b','a'))
>> > > df
>> > x y
>> > 1 1 a
>> > 2 2 b
>> > 3 3 a
>> > > levels(df$y)
>> > [1] "a" "b"
>> >
>> > If you need to convert back to character values, use as.character():
>> >
>> > > as.character(df$y)
>> > [1] "a" "b" "a"
>
>
> got it.
>
>
>> > 1. You can't compare the levels of a factor unless you declared it to
>> > be ordered:
>> >
>> > > df$y[1] > df$y[2]
>> > [1] NA
>> > Warning message:
>> > > not meaningful for factors in: Ops.factor(df$y[1], df$y[2])
>> >
>> > but
>> >
>> > > df$y <- ordered(df$y)
>> > > df$y[1] > df$y[2]
>> > [1] FALSE
>> >
>> > However, you need to watch out here: the comparison is done by the order
>> > of the factors
>
>
> I am sorry I don't understand this.
> here you compare the position of a in the factor and the position of b in
> the factor ?
It's the position of "a" in the levels() vector that is being compared.
I declared that the factor had ordered levels, and R interprets that
to mean that the first level is less than the second level, etc. This
is useful if you want to use meaningful names for ordered categories.
Comparison will be by the order of the categories, not by the name you
chose.
Duncan Murdoch
>
> , not an alphabetic comparison of their names:
>> >
>> > > levels(df$y) <- c("before", "after")
>> > > df
>> > x y
>> > 1 1 before
>> > 2 2 after
>> > 3 3 before
>> > > df$y[1] > df$y[2]
>> > [1] FALSE
>
>
> best regards,
>
> florence.
>
More information about the R-help
mailing list