[R] Help with regex replacements
Bert Gunter
bgunter@4567 @end|ng |rom gm@||@com
Tue Jun 27 20:09:39 CEST 2023
Does this do it for you (or get you closer):
gsub("\\[.*\\]|[\\\\] |/ ","",tmp$Text)
[1] "Я досяг того, чого хотів"
[2] "Мені вдалося\nзробити бажане"
[3] "Я досяг (досягла) того, чого хотів (хотіла)"
[4] "Я\nдосяг(-ла) речей, яких хотілося досягти"
[5] "Я досяг/ла того, чого\nхотів/ла"
[6] "Я досяг\\досягла того, чого прагнув\\прагнула"
[7] "Я\nдосягнув(ла) того, чого хотів(ла)"
On Tue, Jun 27, 2023 at 10:16 AM Chris Evans via R-help <
r-help using r-project.org> wrote:
> I am sure this is easy for people who are good at regexps but I'm
> failing with it. The situation is that I have hundreds of lines of
> Ukrainian translations of some English. They contain things like this:
>
> 1"Я досяг того, чого хотів"2"Мені вдалося зробити бажане"3"Я досяг
> (досягла) того, чого хотів (хотіла)"4"Я досяг(-ла) речей, яких хотілося
> досягти"5"Я досяг/ла того, чого хотів/ла"6"Я досяг\\досягла того, чого
> прагнув\\прагнула."7"Я досягнув(ла) того, чого хотів(ла)"
>
> Using dput():
>
> tmp <- structure(list(Text = c("Я досяг того, чого хотів", "Мені вдалося
> зробити бажане", "Я досяг (досягла) того, чого хотів (хотіла)", "Я
> досяг(-ла) речей, яких хотілося досягти", "Я досяг/ла того, чого
> хотів/ла", "Я досяг\\досягла того, чого прагнув\\прагнула", "Я
> досягнув(ла) того, чого хотів(ла)" )), row.names = c(NA, -7L), class =
> c("tbl_df", "tbl", "data.frame" )) Those show four different ways
> translators have handled gendered words: 1) Ignore them and (I'm
> guessing) only give the masculine 2) Give the feminine form of the word
> (or just the feminine suffix) in brackets 3) Give the feminine
> form/suffix prefixed by a forward slash 4) Give the feminine form/suffix
> prefixed by backslash (here a double backslash) I would like just to
> drop all these feminine gendered options. (Don't worry, they'll get back
> in later.) So I would like to replace 1) anything between brackets with
> nothing! 2) anything between a forward slash and the next space with
> nothing 3) anything between a backslash and the next space with nothing
> but preserving the rest of the text. I have been trying to achieve this
> using str_replace_all() but I am failing utterly. Here's a silly little
> example of my failures. This was just trying to get the text I wanted to
> replace (as I was trying to simplify the issues for my tired wetware): >
> tmp %>%+ as_tibble() %>% + rename(Text = value) %>% + mutate(Text =
> str_replace_all(Text, fixed("."), "")) %>% + filter(row_number() < 4)
> %>% + mutate(Text2 = str_replace(Text, "\\(.*\\)", "\\1")) Errorin
> `mutate()`:ℹIn argument: `Text2 = str_replace(Text, "\\(.*\\)",
> "\\1")`.Caused by error in `stri_replace_first_regex()`:!Trying to
> access the index that is out of bounds. (U_INDEX_OUTOFBOUNDS_ERROR) Run
> `rlang::last_trace()` to see where the error occurred. I have tried
> gurgling around the internet but am striking out so throwing myself on
> the list. Apologies if this is trivial but I'd hate to have to clean
> these hundreds of lines by hand though it's starting to look as if I'd
> achieve that faster by hand than I will by banging my ignorance of R
> regexp syntax on the problem. TIA, Chris
>
> --
> Chris Evans (he/him)
> Visiting Professor, UDLA, Quito, Ecuador & Honorary Professor,
> University of Roehampton, London, UK.
> Work web site: https://www.psyctc.org/psyctc/
> CORE site: http://www.coresystemtrust.org.uk/
> Personal site: https://www.psyctc.org/pelerinage2016/
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list