[Bioc-sig-seq] stringDist; hamming

Patrick Aboyoun paboyoun at fhcrc.org
Mon Jun 21 19:19:18 CEST 2010


Ludo,
Thanks for your bug report. As Harris mentioned in a private e-mail, 
there was an issue at the C-level that resulted in the Hamming distance 
being inappropriately capped at 1. I just fixed this in BioC 2.6 
(Biostrings 2.16.6) and BioC 2.7 (Biostrings 2.17.8). You can obtain 
these new versions from svn directly now, or wait approximately 24-36 
hours to download them via bioconductor.org and biocLite.


 > words <- c("lazy", "hazy", "dasy")
 > stringDist(words, method='hamming')
   1 2
2 1
3 2 2
 > as.matrix(stringDist(words, method='hamming'))
   1 2 3
1 0 1 2
2 1 0 2
3 2 2 0
 > sessionInfo()
R version 2.11.1 Patched (2010-05-31 r52167)
i386-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.16.5 IRanges_1.6.8

loaded via a namespace (and not attached):
[1] Biobase_2.8.0



Patrick



On 6/21/10 7:15 AM, Ludo Pagie wrote:
> Hi all,
>
> I want to calculate hamming distance between equal length
> strings, ie, number of substution differences between two
> strings.
> > From the helppage of 'stringDist' I think the following should
> return the same results but they don't. What am I doing/seeing
> wrong?
>
> words<- c("lazy", "hazy", "dasy")
> sapply(words, neditStartingAt,'lazy',starting.at=1)
> lazy hazy dasy
>     0    1    2
> stringDist(words,method='hamming')
>       1 2
>       2 1
>       3 1 1
>
> I want the result as returned by neditStartingAt, clearly.
>
>    
>> sessionInfo()
>>      
> R version 2.12.0 Under development (unstable) (2010-06-17
> r52313)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets
> methods   base
>
> other attached packages:
> [1] Biostrings_2.17.7 IRanges_1.7.7
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.9.0 tools_2.12.0
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>



More information about the Bioc-sig-sequencing mailing list