[R] stringi behaves differently in 2 similar situations
Sarah Goslee
sarah.goslee at gmail.com
Wed Nov 30 22:27:30 CET 2016
A dot is treated differently if it has a number on no, one, or both sides.
> stri_extract_all_words("me.com", simplify = TRUE)
[,1]
[1,] "me.com"
> stri_extract_all_words("me1.com", simplify = TRUE)
[,1] [,2]
[1,] "me1" "com"
> stri_extract_all_words("me1.2com", simplify = TRUE)
[,1]
[1,] "me1.2com"
?stri_extract_all_words
sent me to
?"stringi-search-boundaries"
which suggests that you should spend some time with the user guide:
_Boundary Analysis_ - ICU User Guide, <URL:
http://userguide.icu-project.org/boundaryanalysis>
Depending on your objective, you might be better off with strsplit()
separating on whitespace.
Sarah
On Wed, Nov 30, 2016 at 3:51 PM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
> Hello!
>
> library(stringi)
>
> stri_extract_all_words("me.com", simplify = TRUE) # returns with a dot
> stri_extract_all_words("watch32.com", simplify = TRUE) # removes the dot
>
> Why is the dot removed only in the second case?
> How is it possible to ask it NOT to remove the dot in the second case?
>
> Thanks a lot!
>
--
Sarah Goslee
http://www.functionaldiversity.org
More information about the R-help
mailing list