[R] Regex to stop at first capital letter after sequence

Sarah Goslee sarah.goslee at gmail.com
Mon Dec 19 23:01:46 CET 2016


Hi,

If your actual data are of the same form as your sample data, why not just:
x <- c("PPA 06 - Promo Vasito", "PPA 05 - Cuentos",
"PPA 04 - Promo vasito", "PPA 03 - Promoción escolar",
"PPA - Saluda a tu pediatra", "PPL - Dia del Pediatra")

sub("^.* - ", "", x)
[1] "Promo Vasito"         "Cuentos"              "Promo vasito"
[4] "Promoción escolar"    "Saluda a tu pediatra" "Dia del Pediatra"



On Mon, Dec 19, 2016 at 4:25 PM, Omar André Gonzáles Díaz
<oma.gonzales at gmail.com> wrote:
> I have the following strings:
>
> [1] "PPA 06 - Promo Vasito"      [2] "PPA 05 - Cuentos"
> [3] "PPA 04 - Promo vasito"      [4] "PPA 03 - Promoción escolar"
> [5] "PPA - Saluda a tu pediatra" [6] "PPL - Dia del Pediatra"
>
> *Desired result*:
>
> [1] "Promo Vasito"                 "Cuentos"                "Promo vasito"
>
> [4] "Promoción escolar"      "Saluda a tu pediatra"   "Dia del Pediatra"
>
>
> *First attemp*:
>
> After this line:
>
> mead_nov$`Nombre del anuncio` <- gsub("(PPA.*)([A-Z].*)", "\\2",
> mead_nov$`Nombre del anuncio`)
>
> I get these:
>
> [1] "Vasito"                 [2] "Cuentos"                [3] "Promo
> vasito"
> [4] "Promoción escolar"      [5] "Saluda a tu pediatra"   [6] "PPL - Dia
> del Pediatra"
>
>
> *Second attemp:*
>
> mead_nov$`Nombre del anuncio` <- gsub("(PPA|PPL.*)([A-Z].*)", "\\2",
> mead_nov$`Nombre del anuncio`)
>
> I get this:
>
> [1] "PPA 06 - Promo Vasito"     [2] "PPA 05 - Cuentos"
> [3] "PPA 04 - Promo vasito"      [3] "PPA 03 - Promoción escolar"
> [5] "PPA - Saluda a tu pediatra" [6] "Pediatra"
>
>
> Thank you for your help.
>

-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list