[R] merge on non-identical names

Henrique Dallazuanna wwwhsd at gmail.com
Tue Nov 24 23:27:28 CET 2009


See agrep:

agrep("American Services", "Americam Services")


On Tue, Nov 24, 2009 at 7:11 PM, j daniel <jdlecy at maxwell.syr.edu> wrote:
>
> Greetings,
>
> I need to conduct a merge on two databases containing information on
> organizations, but the organization names are often non-identical and there
> is no common unique identifier.  Does anyone know a good way to calculate a
> similarity measure on two names, or even better is there a natural language
> matching function in an R package?  I did some searches on this but must not
> know the right keywords to search.
>
> As an example, here are some possible non-identical names:
>
> Oxfam,  Oxfam USA
> American Services, Americam Services   -  (just mis-spelled)
> Global Alliance for Action, Global Alliance for the Environment   -  (a
> non-match)
>
> Any suggestions are welcome!
>
>
> --
> View this message in context: http://old.nabble.com/merge-on-non-identical-names-tp26503346p26503346.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O




More information about the R-help mailing list