[R] Mapping from one vector to another

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Thu Jul 17 18:06:45 CEST 2014


You ask about generic methods for introducing alternate values for 
factors, and some of the other responses address this quite efficiently.

However, a factor has meaning only within one vector at a time, since
another vector may have additional values or missing values relative to
the first vector. For example, you used the "sample" function which
is not guaranteed to select at least one of each of the four letters in 
L4. Or, what if the data has values the mapping doesn't address?

For any work in which I am dealing with categorical data in multiple
places (e.g. your "d" data frame and whatever data structure you use
to define your mapping) I prefer NOT to work with factors until all of
my categories of data are moved into one vector (typically a column
in a data frame). Rather, I work with character vectors during the
data manipulation phase and only convert to factor when I start
analyzing or displaying the data.

With this in mind, I use a general flow something like:

d <- data.frame( x = 1, y = 1:10, fac = fac, stringsAsFactors=FALSE )
mp <- data.frame( fac=LETTERS[1:4], value=c(8,11,3,2) )
d2 <- merge( d, mp, all.x=TRUE )
d2$fac <- factor( d2$fac ) # optional

If you actually are in the analysis phase and are not pulling data from 
multiple external sources, then you may have already confirmed the 
completeness and range of values you have to work with then one of the 
other more efficient methods may still be a better choice for this 
specific task.

Hadley Wickham's "tidy data" [1] principles address this concern more 
thoroughly than I have.

[1] Google this phrase... paper seems to be a work in progress.

On Thu, 17 Jul 2014, Gang Chen wrote:

> Suppose I have the following dataframe:
>
> L4 <- LETTERS[1:4]
> fac <- sample(L4, 10, replace = TRUE)
> (d <- data.frame(x = 1, y = 1:10, fac = fac))
>
>     x  y  fac
> 1  1  1   B
> 2  1  2   B
> 3  1  3   D
> 4  1  4   A
> 5  1  5   C
> 6  1  6   D
> 7  1  7   C
> 8  1  8   B
> 9  1  9   B
> 10 1 10   B
>
> I'd like to add another column 'var' that is defined based on the
> following mapping of column 'fac':
>
> A -> 8
> B -> 11
> C -> 3
> D -> 2
>
> How can I achieve this in an elegant way (with a generic approach for
> any length)?
>
> Thanks,
> Gang
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k



More information about the R-help mailing list