[R] using regular expressions to retrieve a digit-digit-dot structure from a string
Wacek Kusnierczyk
Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Tue Jun 9 09:04:47 CEST 2009
Gabor Grothendieck wrote:
> On Mon, Jun 8, 2009 at 7:18 PM, Wacek
> Kusnierczyk<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
>
>> Gabor Grothendieck wrote:
>>
>>> Try this. See ?regex for more.
>>>
>>>
>>>
>>>> x <- 'This happened in the 21. century." (the dot behind 21 is'
>>>> regexpr("(?![0-9]+)[.]", x, perl = TRUE)
>>>>
>>>>
>>> [1] 24
>>> attr(,"match.length")
>>> [1] 1
>>>
>>>
>> yes, but
>>
>> gregexpr('(?![0-9]+)[.]', 'a. 1. a1.', perl=TRUE)
>> # 2 5 9
>>
>
> Yes, it should be:
>
>
>> gregexpr('(?<=[0-9])[.]', 'a. 1. a1.', perl=TRUE)
>>
> [[1]]
> [1] 5 9
> attr(,"match.length")
> [1] 1 1
>
> which displays the position of every dot that is preceded
> immediately by a digit. Or just replace gregexpr with regexpr
> if its intended that it match only one.
>
i guess what was needed was something like
gregexpr('(?<=\\b[0-9]+)[.]', 'a. 1. a1.', perl=TRUE)
# 5
which won't work, however, because pcre does not support variable-width
lookbehinds.
>
>> which, i guess, is not what you want. if what you want is to match all
>> and only dots that follow at least one digit preceded by a word
>> boundary, then the following should do, as far as i can see:
>>
>> gregexpr('\\b[0-9]+\\K[.]', 'a. 1. a1.', perl=TRUE)
>> # 5
>>
>> vQ
>>
More information about the R-help
mailing list