[R] Help with regex replacements

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Tue Jun 27 21:48:59 CEST 2023


OK, so you want parentheses, not "brackets" + I think I misinterpreted your
specification, which I think is actually incomplete. Based on what I think
you meant, how does this work:

gsub("((\\\\|/)[[:alnum:]]+)|(\\([[:alnum:]-]+\\))", "",tmp$Text)
[1] "Я досяг того, чого хотів"              "Мені вдалося\nзробити бажане"

[3] "Я досяг  того, чого хотів "            "Я\nдосяг речей, яких хотілося
досягти"
[5] "Я досяг того, чого\nхотів"             "Я досяг того, чого прагнув"

[7] "Я\nдосягнув того, чого хотів"

If you want it without the \n's, cat the above to get:
cat(gsub("((\\\\|/)[[:alnum:]]+)|(\\([[:alnum:]-]+\\))", "",tmp$Text))

Я досяг того, чого хотів Мені вдалося
зробити бажане Я досяг  того, чого хотів  Я
досяг речей, яких хотілося досягти Я досяг того, чого
хотів Я досяг того, чого прагнув Я
досягнув того, чого хотів

Cheers,
Bert

On Tue, Jun 27, 2023 at 11:09 AM Bert Gunter <bgunter.4567 using gmail.com> wrote:

> Does this do it for you (or get you closer):
>
>  gsub("\\[.*\\]|[\\\\] |/ ","",tmp$Text)
> [1] "Я досяг того, чого хотів"
> [2] "Мені вдалося\nзробити бажане"
> [3] "Я досяг (досягла) того, чого хотів (хотіла)"
> [4] "Я\nдосяг(-ла) речей, яких хотілося досягти"
> [5] "Я досяг/ла того, чого\nхотів/ла"
> [6] "Я досяг\\досягла того, чого прагнув\\прагнула"
> [7] "Я\nдосягнув(ла) того, чого хотів(ла)"
>
> On Tue, Jun 27, 2023 at 10:16 AM Chris Evans via R-help <
> r-help using r-project.org> wrote:
>
>> I am sure this is easy for people who are good at regexps but I'm
>> failing with it.  The situation is that I have hundreds of lines of
>> Ukrainian translations of some English. They contain things like this:
>>
>> 1"Я досяг того, чого хотів"2"Мені вдалося зробити бажане"3"Я досяг
>> (досягла) того, чого хотів (хотіла)"4"Я досяг(-ла) речей, яких хотілося
>> досягти"5"Я досяг/ла того, чого хотів/ла"6"Я досяг\\досягла того, чого
>> прагнув\\прагнула."7"Я досягнув(ла) того, чого хотів(ла)"
>>
>> Using dput():
>>
>> tmp <- structure(list(Text = c("Я досяг того, чого хотів", "Мені вдалося
>> зробити бажане", "Я досяг (досягла) того, чого хотів (хотіла)", "Я
>> досяг(-ла) речей, яких хотілося досягти", "Я досяг/ла того, чого
>> хотів/ла", "Я досяг\\досягла того, чого прагнув\\прагнула", "Я
>> досягнув(ла) того, чого хотів(ла)" )), row.names = c(NA, -7L), class =
>> c("tbl_df", "tbl", "data.frame" )) Those show four different ways
>> translators have handled gendered words: 1) Ignore them and (I'm
>> guessing) only give the masculine 2) Give the feminine form of the word
>> (or just the feminine suffix) in brackets 3) Give the feminine
>> form/suffix prefixed by a forward slash 4) Give the feminine form/suffix
>> prefixed by backslash (here a double backslash) I would like just to
>> drop all these feminine gendered options. (Don't worry, they'll get back
>> in later.) So I would like to replace 1) anything between brackets with
>> nothing! 2) anything between a forward slash and the next space with
>> nothing 3) anything between a backslash and the next space with nothing
>> but preserving the rest of the text. I have been trying to achieve this
>> using str_replace_all() but I am failing utterly. Here's a silly little
>> example of my failures. This was just trying to get the text I wanted to
>> replace (as I was trying to simplify the issues for my tired wetware): >
>> tmp %>%+ as_tibble() %>% + rename(Text = value) %>% + mutate(Text =
>> str_replace_all(Text, fixed("."), "")) %>% + filter(row_number() < 4)
>> %>% + mutate(Text2 = str_replace(Text, "\\(.*\\)", "\\1")) Errorin
>> `mutate()`:ℹIn argument: `Text2 = str_replace(Text, "\\(.*\\)",
>> "\\1")`.Caused by error in `stri_replace_first_regex()`:!Trying to
>> access the index that is out of bounds. (U_INDEX_OUTOFBOUNDS_ERROR) Run
>> `rlang::last_trace()` to see where the error occurred. I have tried
>> gurgling around the internet but am striking out so throwing myself on
>> the list. Apologies if this is trivial but I'd hate to have to clean
>> these hundreds of lines by hand though it's starting to look as if I'd
>> achieve that faster by hand than I will by banging my ignorance of R
>> regexp syntax on the problem. TIA, Chris
>>
>> --
>> Chris Evans (he/him)
>> Visiting Professor, UDLA, Quito, Ecuador & Honorary Professor,
>> University of Roehampton, London, UK.
>> Work web site: https://www.psyctc.org/psyctc/
>> CORE site: http://www.coresystemtrust.org.uk/
>> Personal site: https://www.psyctc.org/pelerinage2016/
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list