[R] Better use of regex

Thu Sep 15 18:38:31 CEST 2016

Base:

    Filter(Negate(is.na), sapply(regmatches(dimInfo, regexec("HS_(.{1})",
dimInfo)), "[", 2))

Modernverse:

    library(stringi)
    library(purrr)

    stri_match_first_regex(dimInfo, "HS_(.{1})")[,2] %>%
      discard(is.na)

They both use capture groups to find the matches and return just the
matches. The "{1}" isn't really necessary but I include to show that you
can match whatever lengths you want, in this case just 1 char.

On Thu, Sep 15, 2016 at 12:17 PM, Doran, Harold <HDoran at air.org> wrote:

> I have produced a terribly inefficient piece of codes. In the end, it
> gives exactly what I need, but clumsily steps through multiple steps which
> I'm sure could be more efficiently reduced.
>
> Below is a reproducible example. What I have to begin with is character
> vector, dimInfo. What I want to do is parse this vector 1) find the
> elements containing 'HS' and 2) grab *only* the first character after the
> "HS_". The final line of code in the example gives what I need.
>
> Any suggestions on a better approach?
>
> Harold
>
>
> dimInfo <- c("RecordID", "oppID", "position", "key", "operational",
> "IsSelected",
> "score", "item_1_HS_conv_ovrl_scr", "item_1_HS_elab_ovrl_scr",
> "item_1_HS_org_ovrl_scr")
>
> ff <- dimInfo[grep('HS', dimInfo)]
> gg <- strsplit(ff, 'HS_')
> hh <- sapply(1:3, function(i) gg[[i]][2])
> substr(hh, 1, 1)
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]