[R] Sorting the levels of a factor

Gordon Smyth smyth at wehi.EDU.AU
Wed Jul 4 02:48:31 CEST 2007


Dear Chunxu,

I'm cc'ing my reply to the r-help mailing list. As I said in one of 
my previous replies to you, your questions really have to do with use 
of factors in R rather than anything specifically to do with the 
limma package, so they should go to a help list.

The levels() function in R, when applied to a factor, is simply an 
extractor function. It extracts the levels attribute of the factor (a 
short character vector) which was setup when the factor itself was 
created. It doesn't change the levels in any way. For example, 
consider the following code:

   >  f <- factor( c("a","b","a","b"), levels=c("b","a"))
   >  levels(f)
   [1] "b" "a"

The factor() function sets up the factor with levels in a certain 
order. The levels() function simply extracts the previously defined 
levels without resorting them.

If targets$Class is a factor, the correct way to extract the levels is

    levels(targets$Class)

Your use of unique() is superfluous. There is no need for you to use 
unique() anywhere in conjunction with factors if you are using 
factors correctly in R.

When you use read.table() to read data into R, that function will 
automatically convert any character column that it finds in your file 
into a factor. By default, the factor levels will be the unique 
values of the character column in alphabetical order. If that's not 
what you want, you can define your factors explicitly, with levels in 
any order that you like, or else you can use read.table() with 
as.is=TRUE to suppress the creation of factors altogether.

You last comment suggests that you might not be clear on the 
difference between the values of a factor and its levels. You refer 
to an 'array', but there is no array in your example. Your example 
shows the levels of the factor twice, in exactly the same order each 
time. The first row of your output is actually the values of the 
factor created by unique(), not the levels attribute of the factor.

The reason that the levels of your factor are in alphabetical order 
is because you read your data using read.table() in the first place.

Hope this helps
Best wishes
Gordon

At 11:21 PM 3/07/2007, Qu, Chunxu wrote:
>In our Linux cluster, unique did not sort the array but levels did.
>See below.
>Chunxu
>
> > unique(targets$Class)
>[1] Normal    Tumor     Tumor_CN  Normal_CN
>Levels: Normal Normal_CN Tumor Tumor_CN
> > levels(unique(targets$Class))
>[1] "Normal"    "Normal_CN" "Tumor"     "Tumor_CN"
>
>
>
>-----Original Message-----
>From: Gordon Smyth [mailto:smyth at wehi.EDU.AU]
>Sent: Tuesday, July 03, 2007 2:42 AM
>To: Qu, Chunxu
>Subject: RE: limma 2.9.17
>
>Dear Chunxu,
>
>At 06:32 AM 3/07/2007, Qu, Chunxu wrote:
...
> >         if (is.factor(levels))
> >                 levels = levels(levels)
> >
> >This WILL sort the levels.
>
>No, this doesn't sort the levels. It simply extracts the levels
>without changing their order.
...
>Best wishes
>Gordon
>
> >Chunxu



More information about the R-help mailing list