[R] Fwd: Processing a hierarchical string name

Thu Jun 29 01:32:01 CEST 2023

Sorry, not cc'ed to the list.

---------- Forwarded message ---------
From: Bert Gunter <bgunter.4567 using gmail.com>
Date: Wed, Jun 28, 2023 at 4:18 PM
Subject: Re: [R] Processing a hierarchical string name
To: Ivan Krylov <krylov.r00t using gmail.com>
Cc: Kevin Zembower via R-help <r-help using r-project.org>

I probably misunderstand what you want to do, but for:
test <- c(" !!Total:",
" !!Total:!!Population of one race:",
" !!Total:!!Population of one race:!!White alone",
" !!Total:!!Population of one race:!!Black or African American alone",
" !!Total:!!Population of one race:!!American Indian and Alaska Native alone",
" !!Total:!!Population of one race:!!Asian alone" )

gsub(".+!","",test)

gives:
[1] "Total:"                                  "Population of one
race:"
[3] "White alone"                             "Black or African
American alone"
[5] "American Indian and Alaska Native alone" "Asian alone"

which is what you said you wanted afaics.
Note that this depends on the strict structure of the input character
vector(test) and the greediness of the regex matching.

Feel free to ignore without response if I have misunderstood.

Cheers,
Bert

On Wed, Jun 28, 2023 at 1:56 PM Ivan Krylov <krylov.r00t using gmail.com> wrote:
>
> On Wed, 28 Jun 2023 20:29:23 +0000
> Kevin Zembower via R-help <r-help using r-project.org> wrote:
>
> > I think my algorithm for the labels is:
> > 1. keep everything from the last "!!" up to and including the last
> > character
> > 2. for everything remaining, replace each "!!.*:" group with a single
> > space.
>
> If you remove the initial ' !!', the problem becomes a more tractable
> "replace each group of non-'!' followed by '!!' with one space":
>
> bg3_race_sum$label |>
>  (\(.) sub('^ !!', '', .))() |>
>  (\(.) gsub('[^!]*!!', ' ', .))()
>
> But that solution could have been impossible if the task was slightly
> different.
>
> > I can split the label using str_split(label, pattern = "!!") to get a
> > vector of strings, but don't know how to work on the last string and
> > all the rest of the strings separately.
>
> str_split() would have given you a list of character vectors. You can
> use lapply to evaluate a function on each vector inside that list.
> Inside the function, use length(x) (if `x` is the argument of the
> function) to find out how many spaces to produce and which element of
> the vector is the last one. (For code golf points, use rev(x)[1] to get
> the last element.)
>
> --
> Best regards,
> Ivan
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.