[R] Assign factor and levels inside function
Liaw, Andy
andy_liaw at merck.com
Fri Apr 22 03:57:22 CEST 2005
Tim,
> From: Tim Howard
>
> Andy,
> Thank you for the help. Yes, my question really did seem like I was
> going through a lot of unnecessary steps just to define levels of a
> variable. But that was just for the example. In my
> application, I bring
> new datasets into R on a daily basis. While the data differs, the
> variables are the same, and the categorical variables have the same
> levels. So I find myself daily applying the same factor and level
> definitions (by cutting and pasting the large chunk of commands from a
> text file). It really would be simpler to have it wrapped up in a
> function. That's why I asked the question about putting this into a
> function.
> Upon reading your answer, I thought maybe I could use your example
> and use the super-assignment '<<-' in the function. But, your method
> assigns levels, but does not define the var as a factor
> (interesting!).
>
> > levels(y$one) <- seq(1, 9, by=2)
> > y$one
> [1] 1 1 3 3 5 7
> attr(,"levels")
> [1] 1 3 5 7 9
> > is.factor(y$one)
> [1] FALSE
Ouch! "levels<-" is generic, and the default method simply attach the
levels attribute to the object. You need to coerce the object into a factor
explicitly.
> Unfortunately, whenever I try to use <<- with the dataframe as the
> variable, I get an error message:
>
> > fncFact <- function(datfra){
> + datfra$one <<- factor(datfra$one, levels=c(1,3,5,7,9))
> + }
> > fncFact(y)
> Error in fncFact(y) : Object "datfra" not found
I believe the canonical ways of doing something like this in R is something
along the line of:
processData <- function(dat) {
dat$f1 <- factor(dat$f1, levels=...)
... ## any other manipulations you want to do
dat
}
Then when you get new data, you just do:
newData <- processData(newData)
HTH,
Andy
>
> Tim
>
> >>> "Liaw, Andy" <andy_liaw at merck.com> 4/20/2005 4:03:24 PM >>>
> Wouldn't it be easier to do this?
>
> > levels(y$one) <- seq(1, 9, by=2)
> > y$one
> [1] 1 1 3 3 5 7
> attr(,"levels")
> [1] 1 3 5 7 9
>
> Andy
>
> > From: Tim Howard
> >
> > R-help,
> > After cogitating for a while, I finally figured out how to define
> a
> > data.frame column as factor and assign the levels within a
> function...
> > BUT I still need to pass the data.frame and its name
> > separately. I can't
> > seem to find any other way to pass the name of the data.frame,
> rather
> > than the data.frame itself. Any suggestions on how to go
> > about it? Is
> > there something like value(object) or name(object) that I can't
> find?
> >
> > #sample dataframe for this example
> > y <- data.frame(
> > one=c(1,1,3,3,5,7),
> > two=c(2,2,6,6,8,8))
> >
> > > levels(y$one) # check out levels
> > NULL
> >
> > # the function I've come up with
> > fncFact <- function(datfra, datfraNm){
> > datfra$one <- factor(datfra$one, levels=c(1,3,5,7,9))
> > assign(datfraNm, datfra, pos=1)
> > }
> >
> > >fncFact(y, "y")
> > > levels(y$one)
> > [1] "1" "3" "5" "7" "9"
> >
> > I suppose only for aesthetics and simplicity, I'd like to have only
> > pass the data.frame and get the same result.
> > Thanks in advance,
> > Tim Howard
> >
> >
> > > version
> > _
> > platform i386-pc-mingw32
> > arch i386
> > os mingw32
> > system i386, mingw32
> > status
> > major 2
> > minor 0.1
> > year 2004
> > month 11
> > day 15
> > language R
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
> >
> >
>
>
>
> --------------------------------------------------------------
> ----------------
> Notice: This e-mail message, together with any attachment...{{dropped}}
More information about the R-help
mailing list