[R] map two names into one

Kenn Konstabel lebatsnok at gmail.com
Wed Sep 26 11:52:20 CEST 2012


It may be easy or difficult depending on what your data are like.

"GALAXY ACE S 5830" vs "S 5830 GALAXY ACE"

One easy and reasonably general way would be to divide each such bit
into 4 "words" and then compare if set 2 contains exactly all words in
set 1 but possibly in different order.

x1 <- "GALAXY ACE S 5830"
x2 <- "S 5830 GALAXY ACE"
x3 <- "S 5830 GALAXY ZOMBIE"

divide <- function(x) strsplit(x1, " ")[[1]]
check <- function(x, y) all(divide(x) %in% divide(y))
check(x1,x2)
# [1] TRUE
check(x1,x3)
#FALSE

Or you could try reading in your data in a different way so that "S",
"GALAXY", "ACE", and "5830" would be in different variables (if all
product names have identical structure i.e 4 elements, or is S 5830
supposed to be the price?). Or build a catalogue of all possible
product names and then compare each name to it. etc

htmh





On 9/26/12, Tammy Ma <metal_licaling at live.com> wrote:
>
> Dear R user:
>
>
> I have got the following problem:
>
> I have imported two data sets into R: one set includes price information,
> another one includes volume information. but I noticed the wrong data order
> problem in the product name,
>
> for instance,
>
> in one data set,
>
> "GALAXY ACE S 5830"
>
> in another one,
>
> it is "S 5830 GALAXY ACE"
>
> both represent same product. how do i map two name into one in R?
>
> there are so many product name having this problem. i hope there is some
> mechanism which can autimatically map those.  thanks for your help..
>
>
> Kind regards,
> Tammy
>  		 	   		
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list