[R] Locating the starting position of the first number in a string

Mon Nov 2 21:32:50 CET 2015

Tena koe Jen

Not answering your question: if you are after these locations in order to split the IDs in columns, then you might like to consider strsplit; e.g.,

t(sapply(strsplit(ID, '_'), rbind))

You could then split the last column.  You state that there is a 5-digit number at the end.  If this is correct, then use this feature (i.e., nchar(ID)-4) as you'd want "IBBS3_MSM_HN104213" (the fifth element in ID) to split to IBBS3, MSM, HN1 and 04213.  However, if it isn't always 5 digits then split at the first number (i.e., HN and 104213).

HTH .....

Peter Alspach

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jennifer Sabatier
Sent: Tuesday, 3 November 2015 7:39 a.m.
To: r-help at r-project.org
Subject: [R] Locating the starting position of the first number in a string

Hi,

So, I've got a vector of strings that look like this:
ID <- c("IBBS3_MSM_HN01209","IBBS3_MSM_HN01210","IBBS3_MSM_HN01211",
"IBBS3_MSM_HN10212","IBBS3_MSM_HN104213","IBBS3_MSM_HN10214",
"IBBS3_MSM_HN44215","IBBS3_MSM_HN44216","IBBS3_MSM_HN44217",
"IBBS3_MSM_HN44218","IBBS3_MSM_HN44219","IBBS3_MSM_HN44220",
"IBBS3_MSM_HN44221","IBBS3_MSM_HN44222","IBBS3_MSM_HN44223",
"IBBS3_MSM_HN44224","IBBS3_MSM_HN44225","IBBS3_MSM_HN44226",
"IBBS3_MSM_HN44227","IBBS3_MSM_HN12228","IBBS3_MSM_HN12229",
"IBBS3_MSM_HN12230","IBBS3_MSM_HN12231","IBBS3_MSM_HN12232",
"IBBS3_MSM_HN12233","IBBS3_MSM_HN12234","IBBS3_MSM_HN12235",
"IBBS3_MSM_HN12236","IBBS3_MSM_HN12237","IBBS3_MSM_HN12238",
"IBBS3_MSM_HN12239","IBBS3_MSM_HN12240","IBBS3_MSM_HN12241",
"IBBS3_MSM_HN12242","IBBS3_MSM_HN12243","IBBS3_MSM_HN12244",
 "IBBS3_MSM_HN12245","IBBS3_MSM_HN12246","IBBS3_MSM_HN12247",
 "IBBS3_MSM_HN12248","IBBS3_MSM_HN12249","IBBS3_MSM_HN12250",
 "IBBS3_MSM_HN12251","IBBS3_MSM_HN12252","IBBS3_MSM_HN12253",
 "IBBS3_MSM_HN12254","IBBS3_MSM_HN12255","IBBS3_MSM_HN25256",
 "IBBS3_MSM_HN25257","IBBS3_MSM_HN25258","IBBS3_MSM_HN25259",
"IBBS3_MSM_HN25260","IBBS3_MSM_HN25261","IBBS3_MSM_HN25262",
"IBBS3_MSM_HN25263","IBBS3_MSM_HN25264","IBBS3_MSM_HN25265",
"IBBS3_MSM_HN25266","IBBS3_MSM_HN25267","IBBS3_MSM_HN25268",
"IBBS3_MSM_HN25269","IBBS3_MSM_HN25270","IBBS3_MSM_HN25271",
"IBBS3_MSM_HN25272","IBBS3_MSM_HN25273","IBBS3_MSM_HN25274",
"IBBS3_MSM_HN25275","IBBS3_MSM_HN25276", "IBBS3_MSM_HN25277", "IBBS3_MSM_HN25278","IBBS3_MSM_HN25279","IBBS3_MSM_HN25280",
"IBBS3_MSM_HN25281","IBBS3_MSM_HN25282","IBBS3_MSM_HN25283",
"IBBS3_MSM_HN25284","IBBS3_MSM_HMC44285",  "IBBS3_MSM_HMC44286", "IBBS3_MSM_HMC44287","IBBS3_MSM_HMC44288","IBBS3_MSM_HMC44289",
"IBBS3_MSM_HMC44290","IBBS3_MSM_HMC44291","IBBS3_MSM_HMC44292",
"IBBS3_MSM_HMC44293","IBBS3_MSM_HMC44294","IBBS3_MSM_HMC44295",
"IBBS3_MSM_HMC44296","IBBS3_MSM_HMC44297","IBBS3_MSM_HMC44298",
"IBBS3_MSM_HMC44299","IBBS3_MSM_HMC44300","IBBS3_MSM_HMC44301",
"IBBS3_MSM_HMC44302","IBBS3_MSM_HMC44303","IBBS3_MSM_HMC44304",
"IBBS3_MSM_HMC44305","IBBS3_MSM_HMC44306","IBBS3_MSM_HMC44307",
"IBBS3_MSM_HMC44309")

This is an ID that is in the following format:  IBBS3_Type_Group#####

What I want to do is locate the starting position of Type, which is anywhere from 3 to 4 letters long (in this example it's either MSM or PWID), the starting position of Group which is 2-3 letters long (either HN or HMC), and finally the starting position of the 5-digit number.

I'm able to get Type and Group using the following:

TYPE_s <- sapply(c("MSM", "PWID"), regexpr, ID, ignore.case=T)

GROUP_s <- (sapply(c("HN", "HMC"), regexpr, ID, ignore.case=T))

What I am having trouble with is getting the starting position of the 5-digit number.

I am trying:

DIGITS_s <- sapply("([0:9])", regexpr, ID, ignore.case=T)

But that just seems to look for the position of the first 0.:

> DIGITS_s

       ([0:9])

  [1,]      13

  [2,]      13

  [3,]      13

  [4,]      14

  [5,]      14

  [6,]      14

  [7,]      -1

  [8,]      -1

  [9,]      -1

 [10,]      -1

 [11,]      17

 [12,]      17

 [13,]      -1

 [14,]      -1

 [15,]      -1

 [16,]      -1

 [17,]      -1

 [18,]      -1

 [19,]      -1

 [20,]      -1

 [21,]      17

 [22,]      17

 [23,]      -1

 [24,]      -1

 [25,]      -1

 [26,]      -1

 [27,]      -1

 [28,]      -1

 [29,]      -1

 [30,]      -1

 [31,]      17

 [32,]      17

 [33,]      -1

 [34,]      -1

 [35,]      -1

 [36,]      -1

 [37,]      -1

 [38,]      -1

 [39,]      -1

 [40,]      -1

 [41,]      17

 [42,]      17

 [43,]      -1

 [44,]      -1

 [45,]      -1

 [46,]      -1

 [47,]      -1

 [48,]      -1

 [49,]      -1

 [50,]      -1

 [51,]      17

 [52,]      17

 [53,]      -1

 [54,]      -1

 [55,]      -1

 [56,]      -1

 [57,]      -1

 [58,]      -1

 [59,]      -1

 [60,]      -1

 [61,]      17

 [62,]      17

 [63,]      -1

 [64,]      -1

 [65,]      -1

 [66,]      -1

 [67,]      -1

 [68,]      -1

 [69,]      -1

 [70,]      -1

 [71,]      17

 [72,]      17

 [73,]      -1

 [74,]      -1

 [75,]      -1

 [76,]      -1

 [77,]      -1

 [78,]      -1

 [79,]      -1

 [80,]      -1

 [81,]      18

 [82,]      17

 [83,]      17

 [84,]      17

 [85,]      17

 [86,]      17

 [87,]      17

 [88,]      17

 [89,]      17

 [90,]      17

 [91,]      17

 [92,]      17

 [93,]      17

 [94,]      17

 [95,]      17

 [96,]      17

 [97,]      17

 [98,]      17

 [99,]      17

[100,]      17

So, clearly, this is wrong.  I just would like to find the starting position of the first digit, no matter what it is.

It's probably easy, isn't it?

Best,

Jen

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
The contents of this e-mail are confidential and may be ...{{dropped:14}}