[R] Unicode Text Segmentation Algorithms already implemented in R?

Ista Zahn istazahn at gmail.com
Thu Mar 3 14:44:39 CET 2016


You searched, but did not tell us what you found, nor why it was unsuitable
for you undescribed use case. So all we can do is guess: my guess is
http://docs.rexamine.com/R-man/stringi/stringi-search-boundaries.html

Best,
Ista
On Mar 3, 2016 8:14 AM, "Sascha Wolfer" <wolfer at ids-mannheim.de> wrote:

> Hello list members,
>
> I am looking for an implementation of Unicode text segmentation (word
> boundary detection) algorithms in R. You can find information about the
> algorithms here: http://www.unicode.org/reports/tr29/#Word_Boundaries
>
> The help page for the function ‚casefuns‘ from the excellent ‚Unicode‘
> package says: "Other methods will be added eventually (once the Unicode
> text segmentation algorithm is implemented for detecting word boundaries).“
> My simple question is: Are these algorithms already implemented in an R
> package? I didn’t find anything on the web, but I am counting on the power
> of this list. My Stata-using colleague is already picking at me… (in Stata,
> the function ’ustrword’ does exactly what I want to do in R).
>
> Thanks for your help, have a good day, you all!
> Sascha W.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list