[R] gsub warning message
Talbot Katz
topkatz at msn.com
Fri Aug 31 21:21:49 CEST 2007
Ah, I think I'm beginning to see the light. Just to complete the final
thought... the "\" is superfluous with the "_" character, so "\\_+" gets
passed to regex as "\_+" and the "\" is ignored in the search; it also would
be ignored in a replacement. However, as you remarked, "." and "\." act
differently in a search but the same in a replacement. I hope I have that
straight now. Thanks much!
-- TMK --
212-460-5430 home
917-656-5351 cell
>From: "Greg Snow" <Greg.Snow at intermountainmail.org>
>To: "Talbot Katz" <topkatz at msn.com>,ligges at statistik.uni-dortmund.de
>CC: r-help at stat.math.ethz.ch
>Subject: RE: [R] gsub warning message
>Date: Fri, 31 Aug 2007 12:41:37 -0600
>
>What is happening is that before the regex engine can look at your
>pattern, the R string parsing routines first process your input as a
>string. In the string processing there are certain things represented
>using a backslash. Try this code in R:
>
> > cat('here\tthere\n')
>
>The \t is made into a tab and the \n is made into a newline. If you
>want the actuall backslash you need \\:
>
> > cat('here\\tthere\n')
>
>So if you want the regex engine to see \. (which means a literal dot)
>then you need to say \\. So that the string processing sees \\ and
>converts it to \ to pass to the regex engine. If you say \. Then it
>looks in its table where it knows what to do with \t, \n, and others,
>but \. Is not there (it is meaningful to regexs but not string
>proccessing), so gives you the warning. For your example you are using
>it in the replacement portion where the \ in front of . Does not do
>anything, which is why either works. If you are using it in the pattern
>to match, then \\. (which gets reduced to \.) matches a . (dot
>character) while . (without \) matches any single character (with some
>possible exceptions), so in some cases it may give different results.
>
>Hope this helps,
>
>
>
>--
>Gregory (Greg) L. Snow Ph.D.
>Statistical Data Center
>Intermountain Healthcare
>greg.snow at intermountainmail.org
>(801) 408-8111
>
>
>
> > -----Original Message-----
> > From: r-help-bounces at stat.math.ethz.ch
> > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Talbot Katz
> > Sent: Friday, August 31, 2007 12:30 PM
> > To: ligges at statistik.uni-dortmund.de
> > Cc: r-help at stat.math.ethz.ch
> > Subject: Re: [R] gsub warning message
> >
> > Thank you for the swift response. It looks like the code
> > works the same way with or without the "\\" in either the
> > search string: { "\\_+" or "_+" } or the replacement string:
> > { "\\." or "." }. I tested this in Windows and Linux
> > (although we're still on R 2.4.1 in Linux). It's not clear
> > to me why I can use either two slashes or no slash safely,
> > but not one slash, and it makes me vaguely uneasy.
> > Obviously, I need to review regular expressions, but my usual
> > sources, such as http://perldoc.perl.org/perlre.html, don't
> > seem to address this issue. I wonder whether there's a good
> > document explaining this.
> >
> > -- TMK --
> > 212-460-5430 home
> > 917-656-5351 cell
> >
> >
> > >From: Uwe Ligges <ligges at statistik.uni-dortmund.de>
> > >To: Talbot Katz <topkatz at msn.com>
> > >CC: r-help at stat.math.ethz.ch
> > >Subject: Re: [R] gsub warning message
> > >Date: Fri, 31 Aug 2007 18:04:39 +0200
> > >
> > >
> > >
> > >Talbot Katz wrote:
> > >>Hi.
> > >>
> > >>I am using R 2.5.1 on a Windows XP machine. Here is an
> > example of a
> > >>piece of code I was running in older versions of R on the same
> > >>machine. I am looking for underscores and replacing them with
> > >>periods. This result is from R 2.4.1:
> > >>
> > >>>gsub ( "\\_+","\.","AAA_I")
> > >>[1] "AAA.I"
> > >>
> > >>Here is what I get in R 2.5.1:
> > >>
> > >>>gsub ( "\\_+","\.","AAA_I")
> > >>[1] "AAA.I"
> > >>Warning messages:
> > >>1: '\.' is an unrecognized escape in a character string
> > >>2: unrecognized escape removed from "\."
> > >>
> > >>I still get the same result, which is what I want, but now I get a
> > >>warning message. Am I actually doing something wrong that the
> > >>previous versions of R didn't warn me about? Or is this warning
> > >>message unwarranted? Is there a fully approved method for
> > getting the same functionality? Thanks!
> > >
> > >Yes, correct usage is either
> > > gsub ( "\\_+", ".", "AAA_I")
> > >or
> > > gsub ( "\\_+", "\\.", "AAA_I")
> > >
> > >Uwe Ligges
> > >
> > >
> > >
> > >>-- TMK --
> > >>212-460-5430 home
> > >>917-656-5351 cell
> > >>
> > >>______________________________________________
> > >>R-help at stat.math.ethz.ch mailing list
> > >>https://stat.ethz.ch/mailman/listinfo/r-help
> > >>PLEASE do read the posting guide
> > >>http://www.R-project.org/posting-guide.html
> > >>and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
More information about the R-help
mailing list