[R] Locating the starting position of the first number in a string
Peter Alspach
Peter.Alspach at plantandfood.co.nz
Mon Nov 2 21:32:50 CET 2015
Tena koe Jen
Not answering your question: if you are after these locations in order to split the IDs in columns, then you might like to consider strsplit; e.g.,
t(sapply(strsplit(ID, '_'), rbind))
You could then split the last column. You state that there is a 5-digit number at the end. If this is correct, then use this feature (i.e., nchar(ID)-4) as you'd want "IBBS3_MSM_HN104213" (the fifth element in ID) to split to IBBS3, MSM, HN1 and 04213. However, if it isn't always 5 digits then split at the first number (i.e., HN and 104213).
HTH .....
Peter Alspach
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jennifer Sabatier
Sent: Tuesday, 3 November 2015 7:39 a.m.
To: r-help at r-project.org
Subject: [R] Locating the starting position of the first number in a string
Hi,
So, I've got a vector of strings that look like this:
ID <- c("IBBS3_MSM_HN01209","IBBS3_MSM_HN01210","IBBS3_MSM_HN01211",
"IBBS3_MSM_HN10212","IBBS3_MSM_HN104213","IBBS3_MSM_HN10214",
"IBBS3_MSM_HN44215","IBBS3_MSM_HN44216","IBBS3_MSM_HN44217",
"IBBS3_MSM_HN44218","IBBS3_MSM_HN44219","IBBS3_MSM_HN44220",
"IBBS3_MSM_HN44221","IBBS3_MSM_HN44222","IBBS3_MSM_HN44223",
"IBBS3_MSM_HN44224","IBBS3_MSM_HN44225","IBBS3_MSM_HN44226",
"IBBS3_MSM_HN44227","IBBS3_MSM_HN12228","IBBS3_MSM_HN12229",
"IBBS3_MSM_HN12230","IBBS3_MSM_HN12231","IBBS3_MSM_HN12232",
"IBBS3_MSM_HN12233","IBBS3_MSM_HN12234","IBBS3_MSM_HN12235",
"IBBS3_MSM_HN12236","IBBS3_MSM_HN12237","IBBS3_MSM_HN12238",
"IBBS3_MSM_HN12239","IBBS3_MSM_HN12240","IBBS3_MSM_HN12241",
"IBBS3_MSM_HN12242","IBBS3_MSM_HN12243","IBBS3_MSM_HN12244",
"IBBS3_MSM_HN12245","IBBS3_MSM_HN12246","IBBS3_MSM_HN12247",
"IBBS3_MSM_HN12248","IBBS3_MSM_HN12249","IBBS3_MSM_HN12250",
"IBBS3_MSM_HN12251","IBBS3_MSM_HN12252","IBBS3_MSM_HN12253",
"IBBS3_MSM_HN12254","IBBS3_MSM_HN12255","IBBS3_MSM_HN25256",
"IBBS3_MSM_HN25257","IBBS3_MSM_HN25258","IBBS3_MSM_HN25259",
"IBBS3_MSM_HN25260","IBBS3_MSM_HN25261","IBBS3_MSM_HN25262",
"IBBS3_MSM_HN25263","IBBS3_MSM_HN25264","IBBS3_MSM_HN25265",
"IBBS3_MSM_HN25266","IBBS3_MSM_HN25267","IBBS3_MSM_HN25268",
"IBBS3_MSM_HN25269","IBBS3_MSM_HN25270","IBBS3_MSM_HN25271",
"IBBS3_MSM_HN25272","IBBS3_MSM_HN25273","IBBS3_MSM_HN25274",
"IBBS3_MSM_HN25275","IBBS3_MSM_HN25276", "IBBS3_MSM_HN25277", "IBBS3_MSM_HN25278","IBBS3_MSM_HN25279","IBBS3_MSM_HN25280",
"IBBS3_MSM_HN25281","IBBS3_MSM_HN25282","IBBS3_MSM_HN25283",
"IBBS3_MSM_HN25284","IBBS3_MSM_HMC44285", "IBBS3_MSM_HMC44286", "IBBS3_MSM_HMC44287","IBBS3_MSM_HMC44288","IBBS3_MSM_HMC44289",
"IBBS3_MSM_HMC44290","IBBS3_MSM_HMC44291","IBBS3_MSM_HMC44292",
"IBBS3_MSM_HMC44293","IBBS3_MSM_HMC44294","IBBS3_MSM_HMC44295",
"IBBS3_MSM_HMC44296","IBBS3_MSM_HMC44297","IBBS3_MSM_HMC44298",
"IBBS3_MSM_HMC44299","IBBS3_MSM_HMC44300","IBBS3_MSM_HMC44301",
"IBBS3_MSM_HMC44302","IBBS3_MSM_HMC44303","IBBS3_MSM_HMC44304",
"IBBS3_MSM_HMC44305","IBBS3_MSM_HMC44306","IBBS3_MSM_HMC44307",
"IBBS3_MSM_HMC44309")
This is an ID that is in the following format: IBBS3_Type_Group#####
What I want to do is locate the starting position of Type, which is anywhere from 3 to 4 letters long (in this example it's either MSM or PWID), the starting position of Group which is 2-3 letters long (either HN or HMC), and finally the starting position of the 5-digit number.
I'm able to get Type and Group using the following:
TYPE_s <- sapply(c("MSM", "PWID"), regexpr, ID, ignore.case=T)
GROUP_s <- (sapply(c("HN", "HMC"), regexpr, ID, ignore.case=T))
What I am having trouble with is getting the starting position of the 5-digit number.
I am trying:
DIGITS_s <- sapply("([0:9])", regexpr, ID, ignore.case=T)
But that just seems to look for the position of the first 0.:
> DIGITS_s
([0:9])
[1,] 13
[2,] 13
[3,] 13
[4,] 14
[5,] 14
[6,] 14
[7,] -1
[8,] -1
[9,] -1
[10,] -1
[11,] 17
[12,] 17
[13,] -1
[14,] -1
[15,] -1
[16,] -1
[17,] -1
[18,] -1
[19,] -1
[20,] -1
[21,] 17
[22,] 17
[23,] -1
[24,] -1
[25,] -1
[26,] -1
[27,] -1
[28,] -1
[29,] -1
[30,] -1
[31,] 17
[32,] 17
[33,] -1
[34,] -1
[35,] -1
[36,] -1
[37,] -1
[38,] -1
[39,] -1
[40,] -1
[41,] 17
[42,] 17
[43,] -1
[44,] -1
[45,] -1
[46,] -1
[47,] -1
[48,] -1
[49,] -1
[50,] -1
[51,] 17
[52,] 17
[53,] -1
[54,] -1
[55,] -1
[56,] -1
[57,] -1
[58,] -1
[59,] -1
[60,] -1
[61,] 17
[62,] 17
[63,] -1
[64,] -1
[65,] -1
[66,] -1
[67,] -1
[68,] -1
[69,] -1
[70,] -1
[71,] 17
[72,] 17
[73,] -1
[74,] -1
[75,] -1
[76,] -1
[77,] -1
[78,] -1
[79,] -1
[80,] -1
[81,] 18
[82,] 17
[83,] 17
[84,] 17
[85,] 17
[86,] 17
[87,] 17
[88,] 17
[89,] 17
[90,] 17
[91,] 17
[92,] 17
[93,] 17
[94,] 17
[95,] 17
[96,] 17
[97,] 17
[98,] 17
[99,] 17
[100,] 17
So, clearly, this is wrong. I just would like to find the starting position of the first digit, no matter what it is.
It's probably easy, isn't it?
Best,
Jen
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
The contents of this e-mail are confidential and may be ...{{dropped:14}}
More information about the R-help
mailing list