[R] Comparing/diffing strings

Doran, Harold HDoran at air.org
Tue Aug 24 16:27:45 CEST 2010


There is the stringMatch function in the MiscPsycho package. 

> stringMatch('Hadley', 'Hadley Wickham', normalize = 'no')
[1] 8
> stringMatch('Hadley', 'Hadley Wickham', normalize = 'yes')
[1] 0.4285714

It uses Levenshtein distance to tell you how much they differ by, either normalized or not. So, the above two tell you the first string differs from the second string by 8 insertions/deletions/substitutions. The second number normalizes the comparison such that 1 denotes perfect agreement and 2 denotes imperfect agreement.

Examples of an exact match are below.

> stringMatch('Hadley Wickham', 'Hadley Wickham', normalize = 'yes')
[1] 1
> stringMatch('Hadley Wickham', 'Hadley Wickham', normalize = 'n')
[1] 0

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Hadley Wickham
Sent: Tuesday, August 24, 2010 10:17 AM
To: R-help
Subject: [R] Comparing/diffing strings

Hi all,

all.equal is generally very useful when you want to find the
differences between two objects.  It breaks down however, when you
have two long strings to compare:

> all.equal(a, b)
[1] "1 string mismatch"

Does any one know of any good text diffing tools implemented in R?

Thanks,

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list