[R] regexpr with accents
Rui Barradas
ruipbarradas at sapo.pt
Mon Aug 6 08:22:23 CEST 2012
Hello,
Works with me:
d1 <- data.frame(V1 = 1:3,
V2 = c("some text = 9", "some tèxt = 9", "some other text = 9"))
regexpr("some text = 9", d1$V2)
[1] 1 -1 -1
attr(,"match.length")
[1] 13 -1 -1
regexpr("some tèxt = 9", d1$V2)
[1] -1 1 -1
attr(,"match.length")
[1] -1 13 -1
d1$V1[regexpr("some text = 9",d1$V2) > 0] <- 9
d1$V1[regexpr("some tèxt = 9",d1$V2) > 0] <- 9
d1
V1 V2
1 9 some text = 9
2 9 some tèxt = 9
3 3 some other text = 9
What do you mean by "it did not work"? What was the contents of 'd1'?
sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252
[3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Portugal.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] fortunes_1.5-0
Hope this helps,
Rui Barradas
Em 06-08-2012 06:55, Luca Meyer escreveu:
> Hello,
>
> I have build a syntax to find out if a given substring is included in a larger string that works like this:
>
> d1$V1[regexpr("some text = 9",d1$V2)>0] <- 9
>
> and this works all right till "some text" contains standard ASCII set. However, it does not work when accents are included as the following:
>
> d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9
>
> I have tried to substitute "è" with several wildcards but it did not work, can anyone suggest how to have the syntax parse the string ignoring the accent?
>
> Thank you in advance,
>
> Luca
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list