[R] Extracting first number after * in a character vector

Michael Hannon jmhannon.ucdavis at gmail.com
Tue Jan 24 03:31:59 CET 2017


Elegant I don't know, but I think the appended does the trick.

-- Mike

> foo <- c("     1 X[0,SMITH]   *              0             0             1 ",
+  "     2 X[0,JOHNSON] *              0             0             1 ",
+  "     3 X[0,WILLIAMS]              *              1             0 1 ",
+  "     4 X[0,JONES]   *              0             0             1 ",
+  .... [TRUNCATED]

> as.numeric(gsub("^[^*]+[*][^0-9]+([01]).*$", "\\1", foo))
[1] 0 0 1 0 0 0 0 0 0
>

On Mon, Jan 23, 2017 at 1:27 PM, Jim Lemon <drjimlemon at gmail.com> wrote:
> Hi Abhinaba,
> I'm sure that someone will post a terrifyingly elegant regular
> expression that does this, but:
>
>  ardat<-
>  c([1] "     1 X[0,SMITH]   *              0             0             1 ",
>  ...
> numpoststar<-function(x) {
>  xsplit<-unlist(strsplit(x,""))
>  starpos<-which(xsplit=="*")
>  # watch out for a missing asterisk, they cause an infinite loop
>  if(length(starpos)) {
>   digits<-c("0","1","2","3","4","5","6","7","8","9")
>   while(!any(digits %in% xsplit[starpos])) starpos<-starpos+1
>   return(as.numeric(xsplit[starpos]))
>  }
>  return(NA)
> }
>
> for(i in 1:length(ardat)) print(numpoststar(ardat[i]))
>
> The observant will wonder why I didn't use sapply. Because for some
> reason it returned the original strings rather than the numbers. I
> dunno.
>
> Jim
>
> On Mon, Jan 23, 2017 at 11:29 PM, Abhinaba Roy <abhinabaroy09 at gmail.com> wrote:
>> Hi,
>>
>> How do I extract the first number after '*' in a vector?
>>
>> The vector is given below
>>
>>> dput(out[1:10])
>> c("     1 X[0,SMITH]   *              0             0             1 ",
>> "     2 X[0,JOHNSON] *              0             0             1 ",
>> "     3 X[0,WILLIAMS]", "                    *              1             0
>>             1 ",
>> "     4 X[0,JONES]   *              0             0             1 ",
>> "     5 X[0,BROWN]   *              0             0             1 ",
>> "     6 X[0,DAVIS]   *              0             0             1 ",
>> "     7 X[0,MILLER]  *              0             0             1 ",
>> "     8 X[0,WILSON]  *              0             0             1 ",
>> "     9 X[0,MOORE]   *              0             0             1 "
>> )
>>
>> I want a vector with the first number after the asterisk.
>>
>> So the output would give me, a vector (0,0,1,0,0,0,0,0,0,0)
>>
>> How can I do it in R?
>>
>> Best,
>> Abhinaba
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list