[R] Processing a hierarchical string name
Kevin Zembower
kev|n @end|ng |rom zembower@org
Thu Jun 29 20:35:25 CEST 2023
Ivan and Bert, thank you so much for your help.
Ivan, your solution worked perfectly. I didn't really understand how to
do string processing on a vector of strings, and your solution
demonstrated it for me. I modified it to work with the tidyverses'
stringr library in this way:
bg3_race_sum <- bg3_race %>%
left_join(pl_vars, by=c("variable" = "name")) %>%
group_by(variable) %>%
summarize(count = sum(value)) %>%
left_join(pl_vars, by=c("variable" = "name")) %>%
filter(count > 0) %>%
.$label %>%
str_replace("^ !!", "") %>% #Drop the leading ' !!'
str_replace_all("[^!]*!!", " ") #Replace each !!.* with space
Bert, your solution was close to correct. It correctly dropped the right
text, but didn't insert a space for each piece of text between "!!" and
after the ":". I'm using those spaces to preserve the hierarchical
nature of the numbers, how lower numbers (in the chart) are included in
higher numbers. For instance, the "Total:" number is the sum of
"Population of one race" and "Population of two or more races".
Thank you both for helping me with this specific problem and for
increasing my knowledge and abilities with R.
-Kevin
On 6/28/23 16:56, Ivan Krylov wrote:
> On Wed, 28 Jun 2023 20:29:23 +0000
> Kevin Zembower via R-help <r-help using r-project.org> wrote:
>
>> I think my algorithm for the labels is:
>> 1. keep everything from the last "!!" up to and including the last
>> character
>> 2. for everything remaining, replace each "!!.*:" group with a single
>> space.
>
> If you remove the initial ' !!', the problem becomes a more tractable
> "replace each group of non-'!' followed by '!!' with one space":
>
> bg3_race_sum$label |>
> (\(.) sub('^ !!', '', .))() |>
> (\(.) gsub('[^!]*!!', ' ', .))()
>
> But that solution could have been impossible if the task was slightly
> different.
>
>> I can split the label using str_split(label, pattern = "!!") to get a
>> vector of strings, but don't know how to work on the last string and
>> all the rest of the strings separately.
>
> str_split() would have given you a list of character vectors. You can
> use lapply to evaluate a function on each vector inside that list.
> Inside the function, use length(x) (if `x` is the argument of the
> function) to find out how many spaces to produce and which element of
> the vector is the last one. (For code golf points, use rev(x)[1] to get
> the last element.)
>
More information about the R-help
mailing list