[R] Locating the starting position of the first number in a string
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Mon Nov 2 22:33:56 CET 2015
Also not answering your question directly, but may be provide some useful
ideas or results:
> library( gsubfn )
>
> DF <- setNames( data.frame( t( strapply( ID
+ , "^[^_]+_([A-Z]+)_([A-Z]+)([0-9]+)$"
+ , c
+ , simplify=TRUE
+ )
+ )
+ , stringsAsFactors = FALSE
+ )
+ , c( "Type", "Group", "Number" )
+ )
> str( DF )
'data.frame': 100 obs. of 3 variables:
$ Type : chr "MSM" "MSM" "MSM" "MSM" ...
$ Group : chr "HN" "HN" "HN" "HN" ...
$ Number: chr "01209" "01210" "01211" "10212" ...
On Tue, 3 Nov 2015, Peter Alspach wrote:
> Tena koe Jen
>
> Not answering your question: if you are after these locations in order to split the IDs in columns, then you might like to consider strsplit; e.g.,
>
> t(sapply(strsplit(ID, '_'), rbind))
>
> You could then split the last column. You state that there is a 5-digit number at the end. If this is correct, then use this feature (i.e., nchar(ID)-4) as you'd want "IBBS3_MSM_HN104213" (the fifth element in ID) to split to IBBS3, MSM, HN1 and 04213. However, if it isn't always 5 digits then split at the first number (i.e., HN and 104213).
>
> HTH .....
>
> Peter Alspach
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jennifer Sabatier
> Sent: Tuesday, 3 November 2015 7:39 a.m.
> To: r-help at r-project.org
> Subject: [R] Locating the starting position of the first number in a string
>
> Hi,
>
>
> So, I've got a vector of strings that look like this:
> ID <- c("IBBS3_MSM_HN01209","IBBS3_MSM_HN01210","IBBS3_MSM_HN01211",
> "IBBS3_MSM_HN10212","IBBS3_MSM_HN104213","IBBS3_MSM_HN10214",
> "IBBS3_MSM_HN44215","IBBS3_MSM_HN44216","IBBS3_MSM_HN44217",
> "IBBS3_MSM_HN44218","IBBS3_MSM_HN44219","IBBS3_MSM_HN44220",
> "IBBS3_MSM_HN44221","IBBS3_MSM_HN44222","IBBS3_MSM_HN44223",
> "IBBS3_MSM_HN44224","IBBS3_MSM_HN44225","IBBS3_MSM_HN44226",
> "IBBS3_MSM_HN44227","IBBS3_MSM_HN12228","IBBS3_MSM_HN12229",
> "IBBS3_MSM_HN12230","IBBS3_MSM_HN12231","IBBS3_MSM_HN12232",
> "IBBS3_MSM_HN12233","IBBS3_MSM_HN12234","IBBS3_MSM_HN12235",
> "IBBS3_MSM_HN12236","IBBS3_MSM_HN12237","IBBS3_MSM_HN12238",
> "IBBS3_MSM_HN12239","IBBS3_MSM_HN12240","IBBS3_MSM_HN12241",
> "IBBS3_MSM_HN12242","IBBS3_MSM_HN12243","IBBS3_MSM_HN12244",
> "IBBS3_MSM_HN12245","IBBS3_MSM_HN12246","IBBS3_MSM_HN12247",
> "IBBS3_MSM_HN12248","IBBS3_MSM_HN12249","IBBS3_MSM_HN12250",
> "IBBS3_MSM_HN12251","IBBS3_MSM_HN12252","IBBS3_MSM_HN12253",
> "IBBS3_MSM_HN12254","IBBS3_MSM_HN12255","IBBS3_MSM_HN25256",
> "IBBS3_MSM_HN25257","IBBS3_MSM_HN25258","IBBS3_MSM_HN25259",
> "IBBS3_MSM_HN25260","IBBS3_MSM_HN25261","IBBS3_MSM_HN25262",
> "IBBS3_MSM_HN25263","IBBS3_MSM_HN25264","IBBS3_MSM_HN25265",
> "IBBS3_MSM_HN25266","IBBS3_MSM_HN25267","IBBS3_MSM_HN25268",
> "IBBS3_MSM_HN25269","IBBS3_MSM_HN25270","IBBS3_MSM_HN25271",
> "IBBS3_MSM_HN25272","IBBS3_MSM_HN25273","IBBS3_MSM_HN25274",
> "IBBS3_MSM_HN25275","IBBS3_MSM_HN25276", "IBBS3_MSM_HN25277", "IBBS3_MSM_HN25278","IBBS3_MSM_HN25279","IBBS3_MSM_HN25280",
> "IBBS3_MSM_HN25281","IBBS3_MSM_HN25282","IBBS3_MSM_HN25283",
> "IBBS3_MSM_HN25284","IBBS3_MSM_HMC44285", "IBBS3_MSM_HMC44286", "IBBS3_MSM_HMC44287","IBBS3_MSM_HMC44288","IBBS3_MSM_HMC44289",
> "IBBS3_MSM_HMC44290","IBBS3_MSM_HMC44291","IBBS3_MSM_HMC44292",
> "IBBS3_MSM_HMC44293","IBBS3_MSM_HMC44294","IBBS3_MSM_HMC44295",
> "IBBS3_MSM_HMC44296","IBBS3_MSM_HMC44297","IBBS3_MSM_HMC44298",
> "IBBS3_MSM_HMC44299","IBBS3_MSM_HMC44300","IBBS3_MSM_HMC44301",
> "IBBS3_MSM_HMC44302","IBBS3_MSM_HMC44303","IBBS3_MSM_HMC44304",
> "IBBS3_MSM_HMC44305","IBBS3_MSM_HMC44306","IBBS3_MSM_HMC44307",
> "IBBS3_MSM_HMC44309")
>
>
>
>
> This is an ID that is in the following format: IBBS3_Type_Group#####
>
>
> What I want to do is locate the starting position of Type, which is anywhere from 3 to 4 letters long (in this example it's either MSM or PWID), the starting position of Group which is 2-3 letters long (either HN or HMC), and finally the starting position of the 5-digit number.
>
>
> I'm able to get Type and Group using the following:
>
>
> TYPE_s <- sapply(c("MSM", "PWID"), regexpr, ID, ignore.case=T)
>
> GROUP_s <- (sapply(c("HN", "HMC"), regexpr, ID, ignore.case=T))
>
>
> What I am having trouble with is getting the starting position of the 5-digit number.
>
>
> I am trying:
>
>
> DIGITS_s <- sapply("([0:9])", regexpr, ID, ignore.case=T)
>
>
> But that just seems to look for the position of the first 0.:
>
>
>> DIGITS_s
>
> ([0:9])
>
> [1,] 13
>
> [2,] 13
>
> [3,] 13
>
> [4,] 14
>
> [5,] 14
>
> [6,] 14
>
> [7,] -1
>
> [8,] -1
>
> [9,] -1
>
> [10,] -1
>
> [11,] 17
>
> [12,] 17
>
> [13,] -1
>
> [14,] -1
>
> [15,] -1
>
> [16,] -1
>
> [17,] -1
>
> [18,] -1
>
> [19,] -1
>
> [20,] -1
>
> [21,] 17
>
> [22,] 17
>
> [23,] -1
>
> [24,] -1
>
> [25,] -1
>
> [26,] -1
>
> [27,] -1
>
> [28,] -1
>
> [29,] -1
>
> [30,] -1
>
> [31,] 17
>
> [32,] 17
>
> [33,] -1
>
> [34,] -1
>
> [35,] -1
>
> [36,] -1
>
> [37,] -1
>
> [38,] -1
>
> [39,] -1
>
> [40,] -1
>
> [41,] 17
>
> [42,] 17
>
> [43,] -1
>
> [44,] -1
>
> [45,] -1
>
> [46,] -1
>
> [47,] -1
>
> [48,] -1
>
> [49,] -1
>
> [50,] -1
>
> [51,] 17
>
> [52,] 17
>
> [53,] -1
>
> [54,] -1
>
> [55,] -1
>
> [56,] -1
>
> [57,] -1
>
> [58,] -1
>
> [59,] -1
>
> [60,] -1
>
> [61,] 17
>
> [62,] 17
>
> [63,] -1
>
> [64,] -1
>
> [65,] -1
>
> [66,] -1
>
> [67,] -1
>
> [68,] -1
>
> [69,] -1
>
> [70,] -1
>
> [71,] 17
>
> [72,] 17
>
> [73,] -1
>
> [74,] -1
>
> [75,] -1
>
> [76,] -1
>
> [77,] -1
>
> [78,] -1
>
> [79,] -1
>
> [80,] -1
>
> [81,] 18
>
> [82,] 17
>
> [83,] 17
>
> [84,] 17
>
> [85,] 17
>
> [86,] 17
>
> [87,] 17
>
> [88,] 17
>
> [89,] 17
>
> [90,] 17
>
> [91,] 17
>
> [92,] 17
>
> [93,] 17
>
> [94,] 17
>
> [95,] 17
>
> [96,] 17
>
> [97,] 17
>
> [98,] 17
>
> [99,] 17
>
> [100,] 17
>
>
> So, clearly, this is wrong. I just would like to find the starting position of the first digit, no matter what it is.
>
> It's probably easy, isn't it?
>
> Best,
>
> Jen
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> The contents of this e-mail are confidential and may be ...{{dropped:14}}
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
More information about the R-help
mailing list