[R] Comparing/diffing strings
(Ted Harding)
Ted.Harding at manchester.ac.uk
Tue Aug 24 16:38:24 CEST 2010
On 24-Aug-10 14:16:55, Hadley Wickham wrote:
> Hi all,
> all.equal is generally very useful when you want to find the
> differences between two objects. It breaks down however,
> when you have two long strings to compare:
>
>> all.equal(a, b)
> [1] "1 string mismatch"
>
> Does any one know of any good text diffing tools implemented in R?
>
> Thanks,
> Hadley
Hi Hadley,
I suppose it depends on what you want to find out:
all.equal(strsplit("abcdefg",split=""),strsplit("aBcDEfg",split=""))
# [1] "Component 1: 3 string mismatches"
will tell you how many mismatches there are. But, if you want to
find out *what* they are, and/or where, then you would have to do
something like
X <- "abcdefg" ; Y <- "aBcDEfg"
X0 <- unlist(strsplit(X,split="")) ## Nasty but necessary!
Y0 <- unlist(strsplit(Y,split="")) ## ...
ix <- which(X0 != Y0)
cbind(ix,X0[ix],Y0[ix])
# ix
# [1,] "2" "b" "B"
# [2,] "4" "d" "D"
# [3,] "5" "e" "E"
Hoping this helps,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 24-Aug-10 Time: 15:38:22
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list