[R] library/function to compare two phrases?
R. Michael Weylandt
michael.weylandt at gmail.com
Sun Nov 18 00:20:44 CET 2012
On Sat, Nov 17, 2012 at 11:00 PM, Brian Feeny <bfeeny at mac.com> wrote:
> I am looking for a library/function in R that can compare two phrases and give me a score, or somehow classify them as correct as possible.
>
> The "phrases" are obfuscated/messy. I am not concerned about which is "correct" (for example spell checking), I am only concerned in grouping them
> so that I know they are the closest match.
>
> Example:
>
> I have ROW1 and ROW2 like so:
>
> ROW1 ROW2
> hamburger helper bigmc heartkcatta
> chicken nuggets chicke, nuggets, jss
> bigmac heartattack some sombody somehwere
> somebody somehwere repleh regrubmah
>
> I am looking for something that can tell me that the best match for hamburger helper is repleh regrubmah, and the same for each other row.
>
> So my goal is to write a program that foreach phrase in ROW1 runs this function against ROW2 and gives me the phrase that scored best.
>
> I have read over much of the NLP packages at http://cran.r-project.org/web/views/NaturalLanguageProcessing.html
>
> I thought lsa might be a good fit, but I am not sure. I have limited time, so I am hoping someone can point me in a direction of what I am looking for.
>
> I have been searching for "text classifiers", perhaps this problem is referred to as something else.
>
This is outside my expertise, but if memory serves, you might benefit
from googling the Levenshtein (spelling?) distance which allows this
sort of fuzzy matching of strings.
MW
More information about the R-help
mailing list