[R] sub/grep question: extract year

Marc Girondot m@rc_grt @ending from y@hoo@fr
Thu Aug 9 09:57:48 CEST 2018


Hi everybody,

I have some questions about the way that sub is working. I hope that 
someone has the answer:

1/ Why the second example does not return an empty string ? There is no 
match.

subtext <- "-1980-"
sub(".*(1980).*", "\\1", subtext) # return 1980
sub(".*(1981).*", "\\1", subtext) # return -1980-

2/ Based on sub documentation, it replaces the first occurence of a 
pattern: why it does not return 1980 ?

subtext <- " 1980 1981 "
sub(".*(198[01]).*", "\\1", subtext) # return 1981

3/ I want extract year from text; I use:

subtext <- "bla 1980 bla"
sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) # 
return 1980
subtext <- "bla 2010 bla"
sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) # 
return 2010

but

subtext <- "bla 1010 bla"
sub(".*[ \\.\\(-]([12][01289][0-9][0-9])[ \\.\\)-].*", "\\1", subtext) # 
return 1010

I would like exclude the case 1010 and other like this.

The solution would be:

18[0-9][0-9] or 19[0-9][0-9] or 200[0-9] or 201[0-9]

Is there a solution to write such a pattern in grep ?

Thanks a lot

Marc



More information about the R-help mailing list