[R] Factor levels

Gabor Grothendieck ggrothendieck at gmail.com
Tue Aug 28 23:16:55 CEST 2007


Its the same principle.  Just change the function to be suitable.  This one
arranges the levels according to the input:

library(methods)
setClass("my.factor")
setAs("character", "my.factor",
 function(from) factor(from, levels = unique(from)))

Input <- "a b c
1   1 176 w
2   2 141 k
3   3 172 r
4   4 182 s
5   5 123 k
6   6 153 p
7   7 176 l
8   8 170 u
9   9 140 z
10 10 194 s
11 11 164 j
12 12 100 j
13 13 127 x
14 14 137 r
15 15 198 d
16 16 173 j
17 17 113 x
18 18 144 w
19 19 198 q
20 20 122 f
"
DF <- read.table(textConnection(Input), header = TRUE,
  colClasses = list(c = "my.factor"))
str(DF)


On 8/28/07, Sébastien <pomchip at free.fr> wrote:
> Ok, I cannot send to you one of my dataset since they are confidential. But
> I can produce a dummy "mini" dataset to illustrate my question. Let's say I
> have a csv file with 3 columns and 20 rows which content is reproduced by
> the following line.
>
> > mydata<-data.frame(a=1:20,
> b=sample(100:200,20,replace=T),c=sample(letters[1:26], 20,
> replace = T))
> > mydata
>     a   b c
> 1   1 176 w
> 2   2 141 k
> 3   3 172 r
> 4   4 182 s
> 5   5 123 k
> 6   6 153 p
> 7   7 176 l
> 8   8 170 u
> 9   9 140 z
> 10 10 194 s
> 11 11 164 j
> 12 12 100 j
> 13 13 127 x
> 14 14 137 r
> 15 15 198 d
> 16 16 173 j
> 17 17 113 x
> 18 18 144 w
> 19 19 198 q
> 20 20 122 f
>
> If I had to read the csv file, I would use something like:
> mydata<-data.frame(read.table(file="c:/test.csv",header=T))
>
> Now, if you look at mydata$c, the levels are alphabetically ordered.
> > mydata$c
>  [1] w k r s k p l u z s j j x r d j x w q f
> Levels: d f j k l p q r s u w x z
>
> What I am trying to do is to reorder the levels as to have them in the order
> they appear in the table, ie
> Levels: w k r s p l u z j x d q f
>
> Again, keep in mind that my script should be used on datasets which content
> are unknown to me. In my example, I have used letters for mydata$c, but my
> code may have to handle factors of numeric or character values (I need to
> transform specific columns of my dataset into factors for plotting
> purposes). My goal is to let the code scan the content of each factor of my
> data.frame during or after the read.table step and reorder their levels
> automatically without having to ask the user to hard-code the level order.
>
> In a way, my problem is more related to the way the factor levels are
> ordered than to the read.table function, although I guess there is a link...
>
> Gabor Grothendieck a écrit :
> Its not clear from your description what you want.
Could you be a bit more
> specific including an example.

On 8/28/07, Sébastien <pomchip at free.fr>
> wrote:

> Thanks Gabor, I have two questions:

1- Is there any difference between your
> code and the following one, with
regards to Fld2 ?
### test ###

> Input <- "Fld1 Fld2
10 A
20 B
30 C
40 A
"
DF <-

> read.table(textConnection(Input), header =
TRUE)

> DF$Fld2<-factor(DF$Fld2,levels= c("C", "A", "B")))

> 2- do you see any way to bring flexibility to your method ? Because,
> it
looks to me as, at this stage, I have to i) know the order of my
> levels
before I read the table and ii) create one class per factor.
My
> problem is that I am not really working on a specific dataset. My goal is
to
> develop R scripts capable of handling datasets which have various
contents
> but close structures. So, I really need to minimize the quantity
> of
"user-specific" code.

Sebastien

Gabor Grothendieck a écrit :
You can
> create your own class and pass that to read table. In

> the example

> below Fld2 is read in with factor levels C, A, B

> in that

> order.

>
library(methods)
setClass("my.levels")
setAs("character",

> "my.levels",

>  function(from) factor(from, levels = c("C", "A", "B")))


###

> test ###

> Input <- "Fld1 Fld2
10 A
20 B
30 C
40 A
"
DF <-

> read.table(textConnection(Input), header = TRUE,

>  colClasses = c("numeric",

> "my.levels"))

> str(DF)
# or
DF <- read.table(textConnection(Input), header =

> TRUE,

>  colClasses = list(Fld2 = "my.levels"))
str(DF)


On 8/28/07,

> Sébastien <pomchip at free.fr> wrote:

>
> Dear R-users,

> I have found this not-so-recent post in the archives

> -

> http://tolstoy.newcastle.edu.au/R/devel/00a/0291.html -

> while I was

> looking for a particular way to reorder factor levels. The

> question

> addressed by the author was to know if the read.table function

> could be

> modified to order the levels of newly created factors "according to

> the

> order that they appear in the data file". Exactly what I am looking

> for.

> As there was no reply to this post, I wonder if any move have been

> made

> towards the implementation of this suggestion. A quick look

> at

> ?read.table tells me that if this option was implemented, it was not

> in

> the read.table function...

Sebastien

PS: I am sorry to post so many

> messages on the list, but I am learning R

> (basically by trials & errors ;-)

> ) and no one around me has even a

> slight notion about

> it...

> ______________________________________________
R-help at stat.math.ethz.ch
> mailing
list

> https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do

> read the posting
> guide
http://www.R-project.org/posting-guide.html

> and provide

> commented, minimal, self-contained, reproducible code.

>

>
>

>



More information about the R-help mailing list