[R] Changing values (factors) does not change levels of that value?!
Philipp Pagel
p.pagel at wzw.tum.de
Sun Nov 16 15:25:19 CET 2008
On Sun, Nov 16, 2008 at 02:52:10PM +0100, Oliver Bandel wrote:
> OK, but I thought, when touching the data, it will
> recalculate the levels. Now I see, it does not.
No it doesn't - for the reasons given in my explanation.
> > >> x <- factor(c('A','B','C','A','C'))
> > >> y <- x[x!='C']
> > >> y
> > > [1] A B A
> > > Levels: A B C
> > >> factor(y)
> > > [1] A B A
> > > Levels: A B
>
> Sorry, this looks to me like you throw out all the values,
> where the unwanted attribute is. (?!)
Correct, that's what my example does to create a factor with
missing levels.
> That is not what I meant.
I know, but it does not matter how you got a factor with missing
levles - both problem and solution are the same.
> Or at least it's disturbing because
> you use one value, not working on a data-frame, as I do.
Not a real difference either - a data.frame is just a collection
of vectors and/or factors. So all you need to do apply this to
whatever column holds the factor in question:
foo$bar <- factor(foo$bar)
You may want to have a look at the Introdution to R - especially
the section on data frames.
> After some experimentation I found out the following solution:
>
> ========================
> weblog <- read.table("web.log") # reading the log
>
> weblog$V8[ weblog$V8 == "-" ] <- 0 # substituting "-" by 0
>
> # and now changing the levels-attribute to the new values !!
> attr(weblog$V8, "levels") <- levels( factor( as.vector(weblog$V8) ) )
weblog$V8 <- factor(weblog$V8)
is all you need.
> But after I found that, I saw, that this was a detour from what I
> tried when I started, and now using I do the following:
>
> ========================
> weblog <- read.table("web.log") # read in the weblog
>
> weblog$V8[ weblog$V8 == "-" ] <- 0 # substituting "-" by 0
>
> weblog$V8 <- as.numeric( as.vector(weblog$V8) ) # changing it to numeric
Dangerous:
> x <- factor(c(0,1,3,4,5,7))
> x
[1] 0 1 3 4 5 7
Levels: 0 1 3 4 5 7
> as.numeric(x)
[1] 1 2 3 4 5 6
See "7.10 How do I convert factors to numeric?" in the R-FAQ for
details.
As you are reading the data from a file anyway, the simplest
solution would probably be to use the colClasses argument ot
read.table in order to get numeric avlues in the first place.
cu
Philipp
--
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://mips.gsf.de/staff/pagel
More information about the R-help
mailing list