[R] Comparing/diffing strings

(Ted Harding) Ted.Harding at manchester.ac.uk
Tue Aug 24 16:38:24 CEST 2010


On 24-Aug-10 14:16:55, Hadley Wickham wrote:
> Hi all,
> all.equal is generally very useful when you want to find the
> differences between two objects.  It breaks down however,
> when you have two long strings to compare:
> 
>> all.equal(a, b)
> [1] "1 string mismatch"
> 
> Does any one know of any good text diffing tools implemented in R?
> 
> Thanks,
> Hadley

Hi Hadley,
I suppose it depends on what you want to find out:

  all.equal(strsplit("abcdefg",split=""),strsplit("aBcDEfg",split=""))
  # [1] "Component 1: 3 string mismatches"

will tell you how many mismatches there are. But, if you want to
find out *what* they are, and/or where, then you would have to do
something like

  X  <- "abcdefg" ; Y <- "aBcDEfg"
  X0 <- unlist(strsplit(X,split=""))  ## Nasty but necessary!
  Y0 <- unlist(strsplit(Y,split=""))  ## ...
  ix <- which(X0 != Y0)
  cbind(ix,X0[ix],Y0[ix])
  #      ix         
  # [1,] "2" "b" "B"
  # [2,] "4" "d" "D"
  # [3,] "5" "e" "E"

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 24-Aug-10                                       Time: 15:38:22
------------------------------ XFMail ------------------------------



More information about the R-help mailing list