[R] Sorting the levels of a factor
Gordon Smyth
smyth at wehi.EDU.AU
Wed Jul 4 02:48:31 CEST 2007
Dear Chunxu,
I'm cc'ing my reply to the r-help mailing list. As I said in one of
my previous replies to you, your questions really have to do with use
of factors in R rather than anything specifically to do with the
limma package, so they should go to a help list.
The levels() function in R, when applied to a factor, is simply an
extractor function. It extracts the levels attribute of the factor (a
short character vector) which was setup when the factor itself was
created. It doesn't change the levels in any way. For example,
consider the following code:
> f <- factor( c("a","b","a","b"), levels=c("b","a"))
> levels(f)
[1] "b" "a"
The factor() function sets up the factor with levels in a certain
order. The levels() function simply extracts the previously defined
levels without resorting them.
If targets$Class is a factor, the correct way to extract the levels is
levels(targets$Class)
Your use of unique() is superfluous. There is no need for you to use
unique() anywhere in conjunction with factors if you are using
factors correctly in R.
When you use read.table() to read data into R, that function will
automatically convert any character column that it finds in your file
into a factor. By default, the factor levels will be the unique
values of the character column in alphabetical order. If that's not
what you want, you can define your factors explicitly, with levels in
any order that you like, or else you can use read.table() with
as.is=TRUE to suppress the creation of factors altogether.
You last comment suggests that you might not be clear on the
difference between the values of a factor and its levels. You refer
to an 'array', but there is no array in your example. Your example
shows the levels of the factor twice, in exactly the same order each
time. The first row of your output is actually the values of the
factor created by unique(), not the levels attribute of the factor.
The reason that the levels of your factor are in alphabetical order
is because you read your data using read.table() in the first place.
Hope this helps
Best wishes
Gordon
At 11:21 PM 3/07/2007, Qu, Chunxu wrote:
>In our Linux cluster, unique did not sort the array but levels did.
>See below.
>Chunxu
>
> > unique(targets$Class)
>[1] Normal Tumor Tumor_CN Normal_CN
>Levels: Normal Normal_CN Tumor Tumor_CN
> > levels(unique(targets$Class))
>[1] "Normal" "Normal_CN" "Tumor" "Tumor_CN"
>
>
>
>-----Original Message-----
>From: Gordon Smyth [mailto:smyth at wehi.EDU.AU]
>Sent: Tuesday, July 03, 2007 2:42 AM
>To: Qu, Chunxu
>Subject: RE: limma 2.9.17
>
>Dear Chunxu,
>
>At 06:32 AM 3/07/2007, Qu, Chunxu wrote:
...
> > if (is.factor(levels))
> > levels = levels(levels)
> >
> >This WILL sort the levels.
>
>No, this doesn't sort the levels. It simply extracts the levels
>without changing their order.
...
>Best wishes
>Gordon
>
> >Chunxu
More information about the R-help
mailing list