[R] regex question on escaping "." (and a couple other regex questions as well)
David Winsemius
dwinsemius at comcast.net
Thu Jan 7 21:07:40 CET 2010
On Jan 7, 2010, at 2:47 PM, Mark Kimpel wrote:
> I have an example where escaping "." does not seem to be behaving
> consistently, but perhaps it is due to my misunderstanding. Could
> someone
> explain to me why the below produces the output it does?
>
> It seems to me that in the second example, where I am being more
> precise
> about specifying that a "." (dot) should be between the numbers,
> should
> produce the same output as the first example, but it does not.
there is an intervening "0" in between the matching 1-9 group and the
first period causing a pattern failure for match of "160." with "[1-9]+
\\."
>
> As an aside, is there a document or help page that specifies which
> characters need to be escaped to form regex's in R? I can't find one.
?regex # what else?
>
> Finally, how does one grep for the escape character? I've tried
> grep ("\\", vector)
> grep ("\\\", vector)
> grep("\\\\", vector)
Where or perhaps what is "vector"? Why should we think it has an
"escape character" in it? In some sense I think you have an
epistemological problem. There can be back-slashes in strings, but
they are not escape characters at that point.
> all without success.
>
> Thanks, Mark
>
> a <- "160.15.05.00"
> grep("[1-9]+.[0-9]+\\.[0-9]+\\.[0-0]+", a)
> # [1]
> grep("[1-9]+\\.[0-9]+\\.[0-9]+\\.[0-0]+", a)
> # integer(0)
>
>> sessionInfo()
> R version 2.11.0 Under development (unstable) (2009-12-28 r50849)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] grid stats graphics grDevices datasets utils
> methods
> [8] base
>
> other attached packages:
> [1] Rgraphviz_1.25.1 graph_1.25.4
>
> loaded via a namespace (and not attached):
> [1] tools_2.11.0
>
> Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
> Indiana University School of Medicine
>
> 15032 Hunter Court, Westfield, IN 46074
>
> (317) 490-5129 Work, & Mobile & VoiceMail
> (317) 399-1219 Skype No Voicemail please
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list