[R] Assign factor and levels inside function

Tim Howard tghoward at gw.dec.state.ny.us
Fri Apr 22 14:39:28 CEST 2005


Aha!
   You've just opened the door to another level for this blundering R
user.  I even went back to my well-used copy of "An Introduction to R"
to see where I missed this standard approach for processing new data. 
Nothing clear but certainly alluded to in many of the function examples.
 I don't know why I was stuck in that rut.

I'm sure 99.9% of you on this list know this, but... To be clear for
anyone searching these archives later:  Don't bother to ask your
function to make assignments to pos=1 (the global environment), just do
the assignment yourself when calling the function. For example, instead
of coding a function call like this:

processData(dat)

to assign the processed data to pos=1, simply make the assignment when
calling the function:

dat <- processData(dat)


Thanks for being gentle on me, Andy.

Tim

>>> "Liaw, Andy" <andy_liaw at merck.com> 4/21/2005 9:57:22 PM >>>
Tim,

> From: Tim Howard 
> 
> Andy, 
>   Thank you for the help. Yes, my question really did seem like I
was
> going through a lot of unnecessary steps just to define levels of a
> variable. But that was just for the example. In my 
> application, I bring
> new datasets into R on a daily basis. While the data differs, the
> variables are the same, and the categorical variables have the same
> levels. So I find myself daily applying the same factor and level
> definitions (by cutting and pasting the large chunk of commands from
a
> text file). It really would be simpler to have it wrapped up in a
> function.  That's why I asked the question about putting this into a
> function.
>   Upon reading your answer, I thought maybe I could use your example
> and use the super-assignment '<<-' in the function. But, your method
> assigns levels, but does not define the var as a factor 
> (interesting!).
> 
> >  levels(y$one) <- seq(1, 9, by=2)
> > y$one
> [1] 1 1 3 3 5 7
> attr(,"levels")
> [1] 1 3 5 7 9
> > is.factor(y$one)
> [1] FALSE

Ouch!  "levels<-" is generic, and the default method simply attach the
levels attribute to the object.  You need to coerce the object into a
factor
explicitly.

> Unfortunately, whenever I try to use <<- with the dataframe as the
> variable, I get an error message: 
> 
> > fncFact <- function(datfra){
> + datfra$one <<- factor(datfra$one, levels=c(1,3,5,7,9))
> + }
> > fncFact(y)
> Error in fncFact(y) : Object "datfra" not found

I believe the canonical ways of doing something like this in R is
something
along the line of:

processData <- function(dat) {
    dat$f1 <- factor(dat$f1, levels=...)
    ...  ## any other manipulations you want to do
    dat
}

Then when you get new data, you just do:

newData <- processData(newData)

HTH,
Andy

> 
> Tim
> 
> >>> "Liaw, Andy" <andy_liaw at merck.com> 4/20/2005 4:03:24 PM >>>
> Wouldn't it be easier to do this?
> 
> > levels(y$one) <- seq(1, 9, by=2)
> > y$one
> [1] 1 1 3 3 5 7
> attr(,"levels")
> [1] 1 3 5 7 9
> 
> Andy
> 
> > From: Tim Howard
> > 
> > R-help,
> >   After cogitating for a while, I finally figured out how to
define
> a
> > data.frame column as factor and assign the levels within a
> function...
> > BUT I still need to pass the data.frame and its name 
> > separately. I can't
> > seem to find any other way to pass the name of the data.frame,
> rather
> > than the data.frame itself.  Any suggestions on how to go 
> > about it?  Is
> > there something like value(object) or name(object) that I can't
> find?
> > 
> > #sample dataframe for this example
> > y <- data.frame(
> >  one=c(1,1,3,3,5,7),
> >  two=c(2,2,6,6,8,8))
> > 
> > > levels(y$one)   # check out levels
> > NULL
> > 
> > # the function I've come up with
> > fncFact <- function(datfra, datfraNm){
> > datfra$one <- factor(datfra$one, levels=c(1,3,5,7,9))
> > assign(datfraNm, datfra, pos=1)
> > }
> > 
> > >fncFact(y, "y")
> > > levels(y$one)
> > [1] "1" "3" "5" "7" "9"
> > 
> > I suppose only for aesthetics and simplicity, I'd like to have
only
> > pass the data.frame and get the same result.
> > Thanks in advance,
> > Tim Howard
> > 
> > 
> > > version
> >          _              
> > platform i386-pc-mingw32
> > arch     i386           
> > os       mingw32        
> > system   i386, mingw32  
> > status                  
> > major    2              
> > minor    0.1            
> > year     2004           
> > month    11             
> > day      15             
> > language R
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help 
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html 
> > 
> > 
> > 
> 
> 
> 
> --------------------------------------------------------------
> ----------------
> Notice:  This e-mail message, together with any attachments,
contains
> information of Merck & Co., Inc. (One Merck Drive, Whitehouse
Station,
> New Jersey, USA 08889), and/or its affiliates (which may be known
> outside the United States as Merck Frosst, Merck Sharp & Dohme or
MSD
> and in Japan, as Banyu) that may be confidential, proprietary
> copyrighted and/or legally privileged. It is intended solely 
> for the use
> of the individual or entity named on this message.  If you are not
the
> intended recipient, and have received this message in error, please
> notify us immediately by reply e-mail and then delete it from your
> system.
> --------------------------------------------------------------
> ----------------
> 
> 
> 



------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}




More information about the R-help mailing list