version 0.9.14 - Fixed issue with zero-length strings in 'qgrams' (Thanks to Brian Ripley for the notification and pointer to the origin of the problem) version 0.9.12 - apparently R_xlen_t is long long int on CLANG/Windows and long int on gcc-13/debian version 0.9.11 - Fixed a warning in gcc-13: changed specifier from %d to %ld. (Thanks to Kurt Hornik for the head's up) version 0.9.10 - Fixed another warning generated by new C compiler that I overlooked. (Thanks to the CRAN team for the head's up) version 0.9.9 - Fixed warnings generated by new C compiler. (function prototypes must now be defined completely). (Thanks to Kurt Hornik for the head's up.) version 0.9.8 - Fixed some issues on C-level causing problems with the CLANG compiler. (Thanks to Brian Ripley for not only reporting this, but also sending updated code with fixes). version 0.9.7 - Fixes in use of INTEGER() and VECTOR_ELT() after updates in R's C API. this affected 'afind' and 'max_length' (internally). (Thanks to Luke Tierny and Kurt Hornik for the notification). - Fix in 'amatch' causing utf-8 characters to be ignored in some cases (thanks to Joan Mime for reporting #78). - Fix: segfault when 'afind' was called with many search patterns or many texts to be searched. - Fix: stringsimmatrix was not normalized correctly (Thanks to Tamas Ferenci for reporting GH). version 0.9.6.3 - Resubmit. Fixed an URL redirect that was detected by CRAN. version 0.9.6.2 - Resubmit. Fixed url issues detected by CRAN, added doi to description as per CRAN request. version 0.9.6.1 - Bugfix: afind/grab/grabl returned wrong results on MacOS only. (thanks to Prof. Brian Ripley for the notification and for running tests on his personal machine and to Tomas Kalibera for making the ubuntu-rchk docker image available). version 0.9.6 - New function 'afind': find approximate matches in text based on string distance. - New functions 'grab', 'grabl': fuzzy matching equivalent to 'grep' and 'grepl'. - New function 'extract': fuzzy matching equivalent of stringr::str_extract. - New algorithm 'running_cosine': fast fuzzy text search using cosine distance. - New function 'stringsimmatrix' (Thanks to Johannes Gruber). - Number of threads used is now reported when loading 'stringdist'. - Internal fixes (in some cases class() == 'class' was used). version 0.9.5.5 - Changed two URLs to canonical form in README.md (https://) to comply with CRAN policy. version 0.9.5.4 - Some tests using seq_dist() would fail unpredictably when the input was defined with lazily evaluated arguments, e.g. list(1:3, 2:4); but only in the context of NSE by a test suite ('tinytest', 'testthat'). Tests were replaced by literal versions, e.g. list(c(1,2,3), c(2,3,4)). version 0.9.5.3 - Update in test suite to stay on CRAN version 0.9.5.2 - RJournal paper and C/C++ api docs are now presented as vignette. - Switched to tinytest framework - Fix: stringdist could cause a segfault for edit distances between very long strings. (Thanks to GH user gllipatz) version 0.9.5.1 - Fixed header file for C API version 0.9.5.0 - New contributor: Chris Muir - C/C++ API now exposed for packages LinkingTo stringdist. See `?stringdist_api` - Arguments 'maxDist', 'ncores', 'cluster' of functions 'stringdist' and 'stringdistmatrix' have been deprecated for several years and are now removed. - Fixed edge case where cosine distance with q=1, between strings of repeating characters yielded Inf (Thanks to Markus Dumke) version 0.9.4.6 - Fixed argument passing error in lower_tri (thanks to Kurt Hornik) version 0.9.4.5 - New argument 'bt' implementing Winkler's boost threshold for the Jaro-Winkler distance - stringdist(a,b,method="qgram") returns correct value when q>nchar(a) (or b). (Thanks to Giora Simchoni). Also affects stringdistmatrix, amatch, seq_dist, and seq_distmatrix. - registered native routines as now recommended by CRAN version 0.9.4.4 - updated default nr of threads to comply to CRAN policy (thanks to Kurt Hornik). The default nr of cores now equals OMP_NUM_THREADS if set. See ?'stringdist-parallelization' for the full policy. version 0.9.4.2 - bugfix in stringdistmatrix(a): value of p, for jw-distance was ignored (thanks to Max Fritsche) - bugfix in stringdistmatrix(a): Would segfault on q-gram w/input > ~7k strings and q>1 (thanks to Connor McKay) - bugfix in jaccard distance: distance not always correct when passing multiple strings (thanks to Robert Carlson) version 0.9.4.1 - stringdistmatrix(a) now outputs long vectors (issue #45, thanks to Wouter Touw). For stringdistmatrix(a,b) this was already the case, but the length of rows and columns remains restricted to 2^31-1 since long input vectors are not supported (yet). - bugfix in osa/dl/lv distances w/unequal edit weights (thanks to Nathalia Potocka) version 0.9.4 - bugfix: edge case for zero-size for lower tridiagonal dist matrices (caused UBSAN to fire, but gave correct results). - bugfix in jw distance: not symmetric for certain cases (thanks to github user gtumuluri) version 0.9.3 - new function for tokenizing integer sequences: seq_qgrams - new function for matching integer sequences: seq_amatch - new functions computing distances between integer sequences: seq_dist, seq_distmatrix - q-gram based distances are now always 0 when q=0 (used to be Inf if at least one of the arguments was not the empty string) - stringdist, stringdistmatrix now emit warning when presented with 'list' argument - small c-side code optimizations - bugfix in dl, lv, osa distance: weights were not taken into account properly (thanks to Zach Price) version 0.9.2 - Update fixing some errors (missing documentation, tests) in the 0.9.1 release. - Fixed a few possible memory leaks. version 0.9.1 - Argument 'useNames' of 'stringdistmatrix' now accepts 'none', 'strings', and 'names' - New function 'stringsim' computes string similarities between 0 and 1 based on 'stringdist' - Calling 'stringdistmatrix' with a single argument returns an object of class 'dist' - Argument 'cluster' to stringdistmatrix is phased out. It is now ignored with a message. - Specifying 'ncores' was already ignored but now also causes a warning - internal: rewrite of the R/C interface, saving about 1/3 of C-code, making extending easier - bugfix in stringdistmatrix: output was transposed when length(a)==1 (Thanks to github user cpoonolly) - Safer core detection to avoid a failure under Cygwin (thanks to Lauri Koobas) version 0.9.0 - C-code underlying stringdist and amatch now automatically use multithreading based on openMP. The default number of threads is governed by options('sd_num_thread'). - stringdist, stringdistmatrix, amatch and ain gain nthread argument which can overwrite the default maximum number of threads. - Argument 'maxDist' is phased out for 'stringdist' and 'stringdistmatrix'. Specifying it causes a message. - Argument 'ncores' is phased out for 'stringdistmatrix'. It is now ignored and specifying it causes a message. - bugfix in amatch/dl. In certain cases, the best match went undetected. - Documentation improved and rearranged with string metrics, encoding, and parallelization now documented as separate topics. version 0.8.2 - Fixed a few warnings issued by the CLANG compiler (thanks to Brian Ripley). This fixes a bug in amatch/jaccard - Fixed a bug in stringdist/osa, dl: NA incorectly returned (thanks to Lauri Koobas). version 0.8.1 - stringdistmatrix returns dimensionless matrix when both arguments have length zero (thanks to Richie Cotton) - stringdistmatrix gains argument 'useNames' (thanks to Richie Cotton) - Package now 'Imports' parallel rather than 'Depends' on it. - bugfix in optimal string alignment distance: the nr of transpositions was sometimes overcounted (thanks to Frank Binder) - rearranged the documentation. version 0.8.0 - Added soundex-based string distance (thanks to Jan van der Laan) - New function 'phonetic' translates strings to phonetic codes using soundex (thanks to Jan van der Laan) - New function 'printable_ascii' detects non-printable ascii or non-ascii characters. - Precision issue: cosine distance between equal strings would be O(1e-16) in stead of 0.0 (thanks to Ben Haller). - Code cleaning: somewhat better performance when maxDist is unspecified in stringdist. It remains deprecated. - Row names in the output array of 'qgrams' are now in system native encoding (used to be utf8 for all systems). - updated CITATION with page number info as the R Journal is now out. version 0.7.3 - bugfix in jw-distance: out-of-range access in C-code caused R to crash in some cases (thanks to Carol Gan) - bugfix in dl distance: in some cases, distances could be one unit too high. - Updated CITATION file: paper to appear in The R Journal vol 6 (2014). - Some updates in documentation. version 0.7.2 - function 'qgrams' gains .list argument - bugfix in multicore option of stringdistmatrix - bugfix in substitution weight of DL-distance (undercounted when w4 != 1 in some cases) - bugfix in dl.c: C-function read outside of array. version 0.7.0 - added useBytes option: up to ~3-fold speed gain at the cost of possible encoding-dependent results. - new memory allocation method for q-grams increases speed between ~5% and ~30% depending on q and input string. - function 'qgrams' gains useNames option. - jaro-winkler distance gains weight argument. - C-code optimization in edit-based distances: 10~20% speed increase depending on input. - bugfix in amatch: sometimes NA was erroneously returned. - bugfix in amatch/lcs: hamming distance method was called erroneously. version 0.6.1 - bugfix in parallel version of stringdistmatrix: parameter p was not passed (thanks to Ricardo Saporta) - bugfix in lv/osa/dl: maxDist ignored in certain cases version 0.6.0 - added amatch function: approximate matching version of 'match' - added ain function: approximate matching version of '%in%' - qgrams now accepts arbitrary number of arguments. Outputs array, not table - added cosine distance - added Jaccard distance - added Jaro and Jaro-Winkler distances - small performance tweeks in underlying C code - Edge case in stringdistmatrix: output is now always of class matrix - Default maxDist is now Inf (this is only to make it more intuitive and does not break previous code) - BREAKING CHANGE: output -1 is replaced by Inf for all distance methods version 0.5.0 - added qgram counting function 'qgrams' - faster edge case handling in osa method. - edge case in lv/osa/dl methods: distance returned length(b) in stead of -1 when length(a) == 0, maxDist < length(b). - bugfix in lv/osa/dl method: maxDist returned when length(a) > maxDist > 0 (thanks to Daniel Reckhard). - Hamming distance (method='h') now returns -1 for strings of unequal lengts (used to emit error). - added longest common substring distance (method='lcs'). - added qgram distance method. - stringdistmatrix gains cluster argument. version 0.4.2 - Fix in error message for hamming distance - Workaround for system-dependent translation of utf8 NA characters version 0.4.0 - First release