[R] gsub syntax

Dimitris Rizopoulos dimitris.rizopoulos at med.kuleuven.be
Sun Nov 27 11:20:34 CET 2005


you could use something like:

dates <- c("73", "74", "02", "1973", "1974", "2002")
###############
nd <- nchar(dates)
substr(dates, ifelse(nd == 2, 1, 3), nd)


I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm


----- Original Message ----- 
From: "John Logsdon" <j.logsdon at quantex-research.com>
To: <r-help at stat.math.ethz.ch>
Sent: Sunday, November 27, 2005 11:04 AM
Subject: [R] gsub syntax


> Hello
>
> I know that R's string functions are not as extensive as those of 
> Unix but
> I need to do some text handling totally within an R environment 
> because
> the target is a Windows system which will not have the corresponding 
> shell
> utilities, sed, awk etc.
>
> Can anyone explain the following gsub phenomenon to me:
>
>> dates<-c("73","74","02","1973","1974","2002")
>
> I want to take just the last two digits where it is a 4-digit year 
> and
> both digits when it is a 2-digit year.  I should be able to use 
> substr but
> measurement from the string end (with a negative counter or 
> something) is
> not implemented:
>
>> substr(dates,3,4)
> [1] ""   ""   ""   "73" "74" "02"
>> substr(dates,-2,4)
> [1] "73"   "74"   "02"   "1973" "1974" "2002"
>> substr(dates,4,-2)
> [1] "" "" "" "" "" ""
>
> So I tried gsub:
>
>> gsub("[19|20]([0-9][0-9])","\\1",dates)
> [1] "73"  "74"  "02"  "973" "974" "002"
>
> As I understand it (and comparing with sed), the \\1 should take the 
> first
> bracketed string but clearly this doesn't work.  If I try what 
> should also
> work:
>
>> gsub("[19|20]([0-9])([0-9])","\\1\\2",dates)
> [1] "73"  "74"  "02"  "973" "974" "002"
>
> On the other hand the following does work:
>
>> gsub("[19|20]([0-9])([0-9])","\\2",dates)
> [1] "73" "74" "02" "73" "74" "02"
>
> So it appears that the substitution takes one character extra to the 
> left
> but the following indicates that the lower limit of the selected 
> range is
> also at fault:
>
>> s<-c("1","12","123","1234","12345","123456")
>> gsub("[12]([4-6]*)","",s)
> [1] ""     ""     "3"    "34"   "345"  "3456"
>
> Probably more elegant examples could be constructed that could home 
> in on
> the issue.
>
> The version is R 2.0.1 on Linux so perhaps it is a little old now.
>
> Questions:
>
> 1) Am I misunderstanding the gsub use?
>
> 2) Was it a bug that has since been corrected?
>
> 3) Is it still a bug in the latest version?
>
> TIA
>
> JOhn
>
> John Logsdon                               "Try to make things as 
> simple
> Quantex Research Ltd, Manchester UK         as possible but not 
> simpler"
> j.logsdon at quantex-research.com 
> a.einstein at relativity.org
> +44(0)161 445 4951/G:+44(0)7717758675       www.quantex-research.com
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm




More information about the R-help mailing list