[R] using regular expressions to retrieve a digit-digit-dot structure from a string

Marc Schwartz marc_schwartz at me.com
Mon Jun 8 19:43:23 CEST 2009


On Jun 8, 2009, at 12:34 PM, Marc Schwartz wrote:

>
> On Jun 8, 2009, at 9:15 AM, Mark Heckmann wrote:
>
>> Hi,
>>
>>
>>
>> i need to recognize itemization structures in strings which follow  
>> the
>> format: "digit-digit-dot" like e.g.
>>
>>
>>
>> 1.
>>
>> 2.
>>
>> 19.
>>
>> 211.
>>
>>
>>
>> Given the string " This happened in the 21. century." (the dot  
>> behind 21 is
>> used in German instead of 21st) I want know where the dots are but  
>> I do not
>> want the 21.-dot to be returned as well.
>>
>>
>>
>> I am not good at regular expressions. How can I retrieve or  
>> recognize dots
>> excluding the digit-digit-dot structure?
>>
>>
>>
>> TIA, Mark
>>
>
> vec <- c("1.", "2.", "19.", "211.", "This happened in the 21.  
> century")
>
> > grep("^[0-9]+\\.", vec, value = TRUE)
> [1] "1."   "2."   "19."  "211."
>
>
> The regex "^[0-9]+\\." is interpreted as "match one or more digits  
> followed by a period, only at the beginning of the line".  The caret  
> '^' defines the beginning of the line, so that a sequence of numbers  
> followed by a period in the middle of the line will not match.

I mis-read that last part of your query. I see that Henrique and Gabor  
have provided what appear to be correct solutions.

Sorry for the confusion.

Marc




More information about the R-help mailing list