[R] Factor levels

Gabor Grothendieck ggrothendieck at gmail.com
Thu Sep 20 05:18:51 CEST 2007


If you don't know ahead of time how many columns you have and
only that they are a mix of numeric and character (to be converted to
factor) then you can do this:

DF <- read.table(textConnection(Input), header = TRUE, as.is = TRUE)
f <- function(x) if (is.character(x)) factor(x, levels = unique(x)) else x
DF[] <- lapply(DF, f)
DF





On 9/19/07, Sébastien <pomchip at free.fr> wrote:
> Hi Gabor,
>
> I am coming back to you about the method you described to me a month ago to
> define the level order during a read.table call. I initially thought that I
> would need to apply the 'unique' function on a single column of my dataset,
> so I only used it after the read.table step (to make my life easier)...
> Well, I was wrong: I need to reorder all my columns (just to remind you, I
> don't know the numbers of columns my code has to handle). So, here come
> troubles.
>
> I first tried to apply your code as is, although I thought there might be
> some problems. The class can actually not be recycled, when a list notation
> is used (the help says that "colClasses character. A vector of classes to be
> assumed for the columns. Recycled as necessary..."). See the following
> example:
>
> ######################
>
> library(methods)
>
> setClass("my.factor")
>
> setAs("character", "my.factor",
>
>  function(from) factor(from, levels = unique(from)))
>
>
>
> Input<-"a b c d
>
> 1 1 175 n f
>
> 2 2 102 n j
>
> 3 3 187 o n
>
> 4 4 106 u g
>
> 5 5 102 o v
>
> 6 6 133 l x
>
> 7 7 149 w q
>
> 8 8 122 x p
>
> 9 9 151 u r
>
> 10 10 134 e g
>
> 11 11 170 j q
>
> 12 12 103 v n
>
> 13 13 153 n w
>
> 14 14 106 x x
>
> 15 15 185 v x
>
> 16 16 102 s p
>
> 17 17 181 i h
>
> 18 18 192 o k
>
> 19 19 161 d f
>
> 20 20 158 n q
>
> "
>
>
>
> DF <- read.table(textConnection(Input), header = TRUE, colClasses =
> list(c=("my.factor")))
> levels(DF$c)         # properly ordered
>
>
> levels(DF$d)         # not reordered
>
> ######################
>
> I also tried that:
>
> ######################
>
> DF <- read.table(textConnection(Input), header = TRUE, colClasses =
> c("my.factor"))
> levels(DF$c)
>
> levels(DF$d)
>
> ######################
>
> In this case, the class is definitely recycled as all the columns of DF are
> transformed into factors... Not really useful :)
> I tried to modify the content of the list or my second notation, by
> including "integer" or a second "my.factor"... but I did not have much
> success.
> Any idea how to use the class "my.factor" multiple times ?
>
> Thanks in advance
>
>
> Gabor Grothendieck a écrit :
> Its the same principle. Just change the function to be suitable. This
> one
arranges the levels according to the
> input:

library(methods)
setClass("my.factor")
setAs("character",
> "my.factor",
 function(from) factor(from, levels = unique(from)))

Input <-
> "a b c
1 1 176 w
2 2 141 k
3 3 172 r
4 4 182 s
5 5 123 k
6 6 153 p
7 7 176
> l
8 8 170 u
9 9 140 z
10 10 194 s
11 11 164 j
12 12 100 j
13 13 127 x
14 14
> 137 r
15 15 198 d
16 16 173 j
17 17 113 x
18 18 144 w
19 19 198 q
20 20 122
> f
"
DF <- read.table(textConnection(Input), header = TRUE,
 colClasses =
> list(c = "my.factor"))
str(DF)


On 8/28/07, Sébastien <pomchip at free.fr>
> wrote:

> Ok, I cannot send to you one of my dataset since they are confidential.
> But
I can produce a dummy "mini" dataset to illustrate my question. Let's
> say I
have a csv file with 3 columns and 20 rows which content is reproduced
> by
the following line.


> mydata<-data.frame(a=1:20,

> b=sample(100:200,20,replace=T),c=sample(letters[1:26],
> 20,
replace = T))

> mydata

>  a b c
1 1 176 w
2 2 141 k
3 3 172 r
4 4 182 s
5 5 123 k
6 6 153 p
7 7 176
> l
8 8 170 u
9 9 140 z
10 10 194 s
11 11 164 j
12 12 100 j
13 13 127 x
14 14
> 137 r
15 15 198 d
16 16 173 j
17 17 113 x
18 18 144 w
19 19 198 q
20 20 122
> f

If I had to read the csv file, I would use something
> like:
mydata<-data.frame(read.table(file="c:/test.csv",header=T))

Now, if
> you look at mydata$c, the levels are alphabetically ordered.

> mydata$c

>  [1] w k r s k p l u z s j j x r d j x w q f
Levels: d f j k l p q r s u w x
> z

What I am trying to do is to reorder the levels as to have them in the
> order
they appear in the table, ie
Levels: w k r s p l u z j x d q f

Again,
> keep in mind that my script should be used on datasets which content
are
> unknown to me. In my example, I have used letters for mydata$c, but my
code
> may have to handle factors of numeric or character values (I need
> to
transform specific columns of my dataset into factors for
> plotting
purposes). My goal is to let the code scan the content of each
> factor of my
data.frame during or after the read.table step and reorder
> their levels
automatically without having to ask the user to hard-code the
> level order.

In a way, my problem is more related to the way the factor
> levels are
ordered than to the read.table function, although I guess there
> is a link...

Gabor Grothendieck a écrit :
Its not clear from your
> description what you want.

> Could you be a bit more

> specific including an example.

> On 8/28/07, Sébastien <pomchip at free.fr>

> wrote:

>
> Thanks Gabor, I have two questions:

> 1- Is there any difference between your

> code and the following one, with

> regards to Fld2 ?
### test ###


> Input <- "Fld1 Fld2

> 10 A
20 B
30 C
40 A
"
DF <-


> read.table(textConnection(Input), header =

> TRUE)


> DF$Fld2<-factor(DF$Fld2,levels= c("C", "A", "B")))

>
> 2- do you see any way to bring flexibility to your method ? Because,
it

> looks to me as, at this stage, I have to i) know the order of my

> levels

> before I read the table and ii) create one class per factor.
My

> problem is that I am not really working on a specific dataset. My goal is

> to

> develop R scripts capable of handling datasets which have various

> contents

> but close structures. So, I really need to minimize the quantity
of

> "user-specific" code.

Sebastien

Gabor Grothendieck a écrit :
You can

> create your own class and pass that to read table. In

>
> the example

>
> below Fld2 is read in with factor levels C, A, B

>
> in that

>
> order.

>
> library(methods)
setClass("my.levels")
setAs("character",


> "my.levels",

>
>  function(from) factor(from, levels = c("C", "A", "B")))

>
###


> test ###

>
> Input <- "Fld1 Fld2

> 10 A
20 B
30 C
40 A
"
DF <-


> read.table(textConnection(Input), header = TRUE,

>
>  colClasses = c("numeric",

>
> "my.levels"))

>
> str(DF)

> # or
DF <- read.table(textConnection(Input), header =


> TRUE,

>
>  colClasses = list(Fld2 = "my.levels"))

> str(DF)


On 8/28/07,


> Sébastien <pomchip at free.fr> wrote:

>
> Dear R-users,

>
> I have found this not-so-recent post in the archives

>
> -

>
> http://tolstoy.newcastle.edu.au/R/devel/00a/0291.html -

>
> while I was

>
> looking for a particular way to reorder factor levels. The

>
> question

>
> addressed by the author was to know if the read.table function

>
> could be

>
> modified to order the levels of newly created factors "according to

>
> the

>
> order that they appear in the data file". Exactly what I am looking

>
> for.

>
> As there was no reply to this post, I wonder if any move have been

>
> made

>
> towards the implementation of this suggestion. A quick look

>
> at

>
> ?read.table tells me that if this option was implemented, it was not

>
> in

>
> the read.table function...

> Sebastien

PS: I am sorry to post so many


> messages on the list, but I am learning R

>
> (basically by trials & errors ;-)

>
> ) and no one around me has even a

>
> slight notion about

>
> it...

>
> ______________________________________________

> R-help at stat.math.ethz.ch

> mailing

> list


> https://stat.ethz.ch/mailman/listinfo/r-help

> PLEASE do


> read the posting
guide

> http://www.R-project.org/posting-guide.html


> and provide

>
> commented, minimal, self-contained, reproducible code.

>
>
>
>
>

>



More information about the R-help mailing list