[R] regex question on escaping "." (and a couple other regex questions as well)

David Winsemius dwinsemius at comcast.net
Thu Jan 7 21:07:40 CET 2010


On Jan 7, 2010, at 2:47 PM, Mark Kimpel wrote:

> I have an example where escaping "." does not seem to be behaving
> consistently, but perhaps it is due to my misunderstanding. Could  
> someone
> explain to me why the below produces the output it does?
>
> It seems to me that in the second example, where I am being more  
> precise
> about specifying that a "." (dot) should be between the numbers,  
> should
> produce the same output as the first example, but it does not.

there is an intervening "0" in between the matching 1-9 group and the  
first period causing a pattern failure for match of "160." with "[1-9]+ 
\\."

>
> As an aside, is there a document or help page that specifies which
> characters need to be escaped to form regex's in R? I can't find one.

?regex  # what else?

>
> Finally, how does one grep for the escape character? I've tried
>   grep ("\\", vector)
>   grep ("\\\", vector)
>   grep("\\\\", vector)

Where or perhaps what is "vector"? Why should we think it has an  
"escape character" in it? In some sense I think you have an  
epistemological problem. There can be back-slashes in strings, but  
they are not escape characters at that point.

> all without success.


>
> Thanks, Mark
>
> a <- "160.15.05.00"
> grep("[1-9]+.[0-9]+\\.[0-9]+\\.[0-0]+", a)
> # [1]
> grep("[1-9]+\\.[0-9]+\\.[0-9]+\\.[0-0]+", a)
> # integer(0)
>
>> sessionInfo()
> R version 2.11.0 Under development (unstable) (2009-12-28 r50849)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] grid      stats     graphics  grDevices datasets  utils      
> methods
> [8] base
>
> other attached packages:
> [1] Rgraphviz_1.25.1 graph_1.25.4
>
> loaded via a namespace (and not attached):
> [1] tools_2.11.0
>
> Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
> Indiana University School of Medicine
>
> 15032 Hunter Court, Westfield, IN  46074
>
> (317) 490-5129 Work, & Mobile & VoiceMail
> (317) 399-1219 Skype No Voicemail please
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list