[R] testing whether two character vectors contain (the same) items in the same order

Boris Steipe boris.steipe at utoronto.ca
Thu Aug 6 19:17:34 CEST 2015


You are looking for what is known as the "Cayley distance" between vectors - an edit distance that allows only transpositions. RSeek mentions PerMallows (https://cran.r-project.org/web/packages/PerMallows/PerMallows.pdf) and Rankluster (https://cran.r-project.org/web/packages/Rankcluster/Rankcluster.pdf) as packages that support work with Cayley distances. It seems to me that distCayley() in Rankcluster does what you want. From the examples:

x=1:5
y=c(2,3,1,4,5)
distCayley(x,y)
8


Cheers,
Boris





On Aug 6, 2015, at 9:51 AM, Federico Calboli <federico.calboli at helsinki.fi> wrote:

>> 
>> On 6 Aug 2015, at 15:40, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>> 
>> Define "goodness of match" .  For exact matches, see ?"==" , all.equal, etc.
> 
> Fair point.  I would define it as a number that tells me how likely it is that the same (noisy) process produced both lists.
> 
> BW
> 
> F
> 
> 
> 
> 
>> 
>> Bert
>> 
>> On Thursday, August 6, 2015, Federico Calboli <federico.calboli at helsinki.fi> wrote:
>> Hi All,
>> 
>> let’s assume I have a vector of letters drawn only once from the alphabet:
>> 
>> x = sample(letters, 15, replace = F)
>> x
>> [1] "z" "t" "g" "l" "u" "d" "w" "x" "a" "q" "k" "j" "f" "n" “v"
>> 
>> y = x[c(1:7,9:8, 10:12, 14, 15, 13)]
>> 
>> I would now like to test how good a match y is for x.  Obviously I can transform the letters in numbers and use a rank test, but I was left wondering whether this is the only solution and whether there are more appropriate solutions that are already implemented in R (I am not going to reinvent the wheel if I can avoid it).
>> 
>> BW
>> 
>> F
>> 
>> 
>> --
>> Federico Calboli
>> Ecological Genetics Research Unit
>> Department of Biosciences
>> PO Box 65 (Biocenter 3, Viikinkaari 1)
>> FIN-00014 University of Helsinki
>> Finland
>> 
>> federico.calboli at helsinki.fi
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> -- 
>> Bert Gunter
>> 
>> "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom."
>>   -- Clifford Stoll
>> 
> 
> 
> --
> Federico Calboli
> Ecological Genetics Research Unit
> Department of Biosciences
> PO Box 65 (Biocenter 3, Viikinkaari 1)
> FIN-00014 University of Helsinki
> Finland
> 
> federico.calboli at helsinki.fi
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list