[R] parsing strings

jim holtman jholtman at gmail.com
Tue Jul 10 01:40:25 CEST 2007


Is this what you want:

> x <- "A10B10A10  B5AB 10 CD 12A10CD2EF3"
> x <- gsub(" ", "", x)  # remove blanks
> y <- gregexpr("[A-Z]+\\s*[0-9]+", x )[[1]]
>
> substring(x, y, y + attr(y, 'match.length') - 1)
[1] "A10"  "B10"  "A10"  "B5"   "AB10" "CD12" "A10"  "CD2"  "EF3"
>


On 7/9/07, Drescher, Michael (MNR) <michael.drescher at ontario.ca> wrote:
> Hi All,
>
>
>
> I have strings made up of an unknown number of letters, digits, and
> spaces. Strings always start with one or two letters, and always end
> with one or two digits. A set of letters (one or two letters) is always
> followed by a set of digits (one or two digits), possibly with one or
> more spaces between the sets of letters and digits. A set of letters
> always belongs to the following set of digits and I want to parse the
> strings into these groups. As an example, the strings and the desired
> parsing results could look like this:
>
>
>
> A10B10, desired parsing result: A10 and B10
>
> A10  B5, desired parsing result: A10 and B5
>
> AB 10 CD 12, desired parsing result: AB10 and CD12
>
> A10CD2EF3, desired parsing result: A10, CD2, and EF3
>
>
>
> I assume that it is possible to search a string for letters and digits
> and then break the string where letters are followed by digits, however
> I am a bit clueless about how I could use, e.g., the 'charmatch' or
> 'parse' commands to achieve this.
>
>
>
> Thanks a lot in advance for your help.
>
>
>
> Best, Michael
>
>
>
>
>
>
>
> Michael Drescher
>
> Ontario Forest Research Institute
>
> Ontario Ministry of Natural Resources
>
> 1235 Queen St East
>
> Sault Ste Marie, ON, P6A 2E3
>
> Tel: (705) 946-7406
>
> Fax: (705) 946-2030
>
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list