[R] assign factor levels based on list
William Dunlap
wdunlap at tibco.com
Wed Feb 9 22:41:58 CET 2011
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Tim Howard
> Sent: Wednesday, February 09, 2011 12:44 PM
> To: r-help at r-project.org
> Subject: [R] assign factor levels based on list
>
> All,
>
> Given a data frame and a list containing factor definitions
> for certain columns, how can I apply those definitions from
> the list, rather than doing it the standard way, as noted
> below. I'm lost in the world of do.call, assign, paste, and
> can't find my way through. For example:
>
> #set up df
> y <- data.frame(colOne = c(1,2,3), colTwo =
> c("apple","pear","orange"))
>
> factor.defs <- list(colOne = list(name = "colOne",
> lvl = c(1,2,3,4,5,6)),
> colTwo = list(name = "colTwo",
> lvl = c("apple","pear","orange","fig","banana")))
Why not the following format?
my.factor.defs <- list(colOne = c(1,2,3,4,5,6),
colTwo = c("apple", "pear", "orange", "fig",
"banana"))
Do you really want to support a case like the following?
list(colOne = list( name = "anotherColumn", lvl=c(1,2,3,4,5,6))
> #A standard way to define levels
> y$colTwo <- factor(y$colTwo , levels =
> c("apple","pear","orange","fig","banana"))
>
> # I'd like to use the definitions locally but also pass them
> (but not the data) to a function,
> # so, rather than defining each manually each time, I'd like
> to loop through the columns,
> # call them by name, find the definitions in the list and use
> them from there. Before I try to loop
> # or use some form of apply, I'd like to get a single factor
> definition working.
First write a function that takes a data.frame and list
of desired levels for each column and outputs a new data.frame.
E.g., if you use the simpler form of the levelsList I gave
above, the following might work well enough (it does no
error checking):
assignNewLevelsToDataFrameColumns <- function(x, levelsList) {
for(colName in names(levelsList)) {
# note that x$name is equivalent to x[["name"]], so
# if you want to use a variable as the name, use [[.
x[[colName]] <- factor(x[[colName]],
levels=levelsList[[colName]])
}
x
}
Test it:
> fixedY <- assignNewLevelsToDataFrameColumns(y, my.factor.defs)
colOne colTwo
1 1 apple
2 2 pear
3 3 orange
> str(fixedY)
'data.frame': 3 obs. of 2 variables:
$ colOne: Factor w/ 6 levels "1","2","3","4",..: 1 2 3
$ colTwo: Factor w/ 5 levels "apple","pear",..: 1 2 3
Do
> y <- assignNewLevelsToDataFrameColumns(y, my.factor.defs)
if you want to overwrite the old y.
Now if you want a function that changes the data.frame you give
it, use a replacement function. If you want to use the syntax
> func(y) <- newStuff
then the function should be called `func<-` and the last argument
must be called 'value' (newStuff will be passed via value=newStuff).
E.g.,
`func<-` <- function(x, value) {
alteredX <- assignNewLevelsToDataFrameColumns(x, value)
alteredX
}
and use it as
> func(y) <- my.factor.defs
> str(y)
'data.frame': 3 obs. of 2 variables:
$ colOne: Factor w/ 6 levels "1","2","3","4",..: 1 2 3
$ colTwo: Factor w/ 5 levels "apple","pear",..: 1 2 3
The first command gets translated into
y <- `func<-`(y, value=my.factor.defs)
If you write a replacement function, it is nice to create a matching
extractor function called 'func'. E.g.,
> func <- function(x) lapply(x, levels)
> func(y)
$colOne
[1] "1" "2" "3" "4" "5" "6"
$colTwo
[1] "apple" "pear" "orange" "fig" "banana"
Note that this avoids assign(), get(), eval(), etc., and
thus makes it easy to follow the flow of data in the code: only
things on the left side of the assignment arrow can get
changed.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
>
> # this doesn't seem to see the dataframe properly
> do.call(factor,list((paste("y$",factor.defs[2][[1]]$name,sep="
")),levels=factor.defs[2][[1]]$lvl))
>
> #adding "as.name" doesn't help
> do.call(factor,list(as.name(paste("y$",factor.defs[2][[1]]$nam
e,sep="")),levels=factor.defs[2][[1]]$lvl))
>
> #Here's my attempt to mimic the standard way, using assign.
> Ha! what a joke.
> assign(as.name(paste("y$",factor.defs[2][[1]]$name,sep="")),
> do.call(factor,
> list(as.name(paste("y$",factor.defs[2][[1]]$name,sep="")),
> levels = factor.defs[2][[1]]$lvl)))
> ##Error in function (x = character(), levels, labels =
> levels, exclude = NA, :
> ## object 'y$colTwo' not found
> Any help or perspective (or better way from the beginning!)
> would be greatly appreciated.
> Thanks in advance!
> Tim
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list