[R] map two names into one
Kenn Konstabel
lebatsnok at gmail.com
Wed Sep 26 11:52:20 CEST 2012
It may be easy or difficult depending on what your data are like.
"GALAXY ACE S 5830" vs "S 5830 GALAXY ACE"
One easy and reasonably general way would be to divide each such bit
into 4 "words" and then compare if set 2 contains exactly all words in
set 1 but possibly in different order.
x1 <- "GALAXY ACE S 5830"
x2 <- "S 5830 GALAXY ACE"
x3 <- "S 5830 GALAXY ZOMBIE"
divide <- function(x) strsplit(x1, " ")[[1]]
check <- function(x, y) all(divide(x) %in% divide(y))
check(x1,x2)
# [1] TRUE
check(x1,x3)
#FALSE
Or you could try reading in your data in a different way so that "S",
"GALAXY", "ACE", and "5830" would be in different variables (if all
product names have identical structure i.e 4 elements, or is S 5830
supposed to be the price?). Or build a catalogue of all possible
product names and then compare each name to it. etc
htmh
On 9/26/12, Tammy Ma <metal_licaling at live.com> wrote:
>
> Dear R user:
>
>
> I have got the following problem:
>
> I have imported two data sets into R: one set includes price information,
> another one includes volume information. but I noticed the wrong data order
> problem in the product name,
>
> for instance,
>
> in one data set,
>
> "GALAXY ACE S 5830"
>
> in another one,
>
> it is "S 5830 GALAXY ACE"
>
> both represent same product. how do i map two name into one in R?
>
> there are so many product name having this problem. i hope there is some
> mechanism which can autimatically map those. thanks for your help..
>
>
> Kind regards,
> Tammy
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list