[R-sig-Epi] Fwd: Identify medicines names

Tue Apr 6 16:41:35 CEST 2021

Hi Gianpaolo,

It works now, thank you!

But it is not what I need exactly.
I will explain better.

Your solution is good. To identify what is antibiotic and for this my
solution solved too:

######################################################
matches  <- unlist(sapply(patterns, function(p) grep(p, df$name,
                                                     value = FALSE,
                                                     ignore.case = TRUE)
                          )
                   )
anti <- df[matches,]
########################################################

But what I need, beyond identifying what is an antibiotic:
- Create a new variable (when the medicine is antibiotic - into the
patterns object) with the name from patterns name.
I did this with the code below - fuzzyjoin::regex_left_join() function:

#########################################################
#List of medicines that - object called patterns.
patterns <-  c("Oritavancina", "Oxacilina", "Pefloxacino", "Penicilina",
              "Pexiganan",  "Piperacilina-tazobactam","Tazobactam",
              "Pirazinamida", "Plazomicina", "Polimixina B",
              "Posilozid","Piperacilina")
patterns <- toupper(patterns)

# Sample Data frame where I need to find the names from the list above.
df <- data.frame(name =
                     c("CLORETO DE POTASSIO DRAGEA 600MG",
                       "CLORETO DE SODIO 0,9% SERINGA PREENCHIDA 5ML",
                       "CLORETO DE SODIO SOLUCAO INJETAVEL 0,9% 10ML",
                       "CODEINA FOSFATO SOLUCAO ORAL 3MGML 10ML ISCMPA @",
                       "CODEINA FOSFATO SOLUCAO ORAL 3MGML 5ML ISCMPA @",
                       "DipiRONA SOLUCAO INJETAVEL 500MGML 2ML",
                       "DipiRONA SOLUCAO INJETAVEL 500MGML 2ML",
                       "FUROSEMIDA SOLUCAO INJETAVEL 10MGML 2ML",
                       "HIDROCORTISONA SUCCINATO SODICO PO LIOFILO
INJETAVEL 100MG",
                       "ONDANSETRONA CLORIDRATO SOLUCAO INJETAVEL 2MGML
4ML",
                       "ONDANSETRONA CLORIDRATO SOLUCAO INJETAVEL 2MGML
4ML",
                       "Penicilina G BENZATINA PO LIOFILO INJETAVEL
1200000UI",
                       "Penicilina G BENZATINA PO LIOFILO INJETAVEL
1200000UI",
                       "PIPERACILINA SODICA 4G + TAZOBACTAM SODICA 0,5G PO
LIOFILO INJETAVEL"))

df <- df %>% mutate(name = toupper(name))
patterns <- data.frame(name = patterns)
results <- fuzzyjoin::regex_left_join(df,
                                      patterns,
                           by = "name")
results
#########################################################
Notice, from results object, when the name of medicine is double
(PIPERACILINA SODICA 4G + TAZOBACTAM SODICA 0,5G PO LIOFILO INJETAVEL"),
the solution doesn't find "PIPERACILINA-TAZOBACTAM"
The code created two new lines PIPERACILINA and othe with TAZOBACTAM.

I think that this explanation was more clear.

Em ter., 6 de abr. de 2021 às 03:55, Gianpaolo Romeo <
gianpaolo.romeo using gmail.com> escreveu:

> Sorry,
> I wrote the code on a smartphone without using R, try this:
>
> require(dplyr)
>
> patterns <- c("Oritavancina", "Oxacilina", "Pefloxacino", "Penicilina",
>               "Pexiganan", "Piperacilina", "Piperacilina-tazobactam",
>               "Pirazinamida", "Plazomicina", "Polimixina B",
>               "Posilozid")
>
> patterns.new <- paste(patterns, collapse = "|")
>
>
> df <- data.frame(name =
>                    c("CLORETO DE POTASSIO DRAGEA 600MG",
>                      "CLORETO DE SODIO 0,9% SERINGA PREENCHIDA 5ML",
>                      "CLORETO DE SODIO SOLUCAO INJETAVEL 0,9% 10ML",
>                      "CODEINA FOSFATO SOLUCAO ORAL 3MGML 10ML ISCMPA @",
>                      "CODEINA FOSFATO SOLUCAO ORAL 3MGML 5ML ISCMPA @",
>                      "DipiRONA SOLUCAO INJETAVEL 500MGML 2ML",
>                      "DipiRONA SOLUCAO INJETAVEL 500MGML 2ML",
>                      "FUROSEMIDA SOLUCAO INJETAVEL 10MGML 2ML",
>                      "HIDROCORTISONA SUCCINATO SODICO PO LIOFILO INJETAVEL
> 100MG",
>                      "ONDANSETRONA CLORIDRATO SOLUCAO INJETAVEL 2MGML 4ML",
>                      "ONDANSETRONA CLORIDRATO SOLUCAO INJETAVEL 2MGML 4ML",
>                      "Penicilina G BENZATINA PO LIOFILO INJETAVEL
> 1200000UI",
>                      "Penicilina G BENZATINA PO LIOFILO INJETAVEL
> 1200000UI",
>                      "PIPERACILINA SODICA 4G + TAZOBACTAM SODICA 0,5G
> POLIOFILO INJETAVEL"))
>
>
> results <- df %>% filter(grepl(pattern = patterns.new, x = name,
> ignore.case = TRUE))
>
> Il giorno mar 6 apr 2021 alle ore 02:06 Felipe Barletta <
> felipe.e.barletta using gmail.com> ha scritto:
>
>> Thanks a lotados Gianpaolo, but your suggest didn't work.
>>
>> Em seg, 5 de abr de 2021 4:50 PM, Gianpaolo Romeo <
>> gianpaolo.romeo using gmail.com> escreveu:
>>
>>> I suggest you to use dplyr package:
>>>
>>>
>>>
>>> df %>% mutate(name = toupper(name)) %>%
>>> filter(grepl(pattern = patterns, name))
>>>
>>>
>>> If you want ti search every string that start exactly with a spedific
>>> word:
>>>
>>> patterns <- paste0("^", patterns))
>>>
>>>
>>> Il lun 5 apr 2021, 20:25 Felipe Barletta <felipe.e.barletta using gmail.com>
>>> ha scritto:
>>>
>>>> Hi friends,
>>>>
>>>> Hi friends,
>>>>
>>>> I need to identify medicines names in a data set.
>>>> I have a list of antibiotic names and I need to identify those names in
>>>> a
>>>> sample.
>>>>
>>>> When the name of the medicine is simple, my solution worked, see:
>>>>
>>>> #List of medicines that - object called patterns.
>>>> patterns <- c("Oritavancina", "Oxacilina", "Pefloxacino", "Penicilina",
>>>>               "Pexiganan", "Piperacilina", "Piperacilina-tazobactam",
>>>>               "Pirazinamida", "Plazomicina", "Polimixina B",
>>>>               "Posilozid")
>>>>
>>>>
>>>> # Sample Data frame where I need to find the names from the list above.
>>>> df <- data.frame(name =
>>>>                      c("CLORETO DE POTASSIO DRAGEA 600MG",
>>>>                        "CLORETO DE SODIO 0,9% SERINGA PREENCHIDA 5ML",
>>>>                        "CLORETO DE SODIO SOLUCAO INJETAVEL 0,9% 10ML",
>>>>                        "CODEINA FOSFATO SOLUCAO ORAL 3MGML 10ML ISCMPA
>>>> @",
>>>>                        "CODEINA FOSFATO SOLUCAO ORAL 3MGML 5ML ISCMPA
>>>> @",
>>>>                        "DipiRONA SOLUCAO INJETAVEL 500MGML 2ML",
>>>>                        "DipiRONA SOLUCAO INJETAVEL 500MGML 2ML",
>>>>                        "FUROSEMIDA SOLUCAO INJETAVEL 10MGML 2ML",
>>>>                        "HIDROCORTISONA SUCCINATO SODICO PO LIOFILO
>>>> INJETAVEL 100MG",
>>>>                        "ONDANSETRONA CLORIDRATO SOLUCAO INJETAVEL 2MGML
>>>> 4ML",
>>>>                        "ONDANSETRONA CLORIDRATO SOLUCAO INJETAVEL 2MGML
>>>> 4ML",
>>>>                        "Penicilina G BENZATINA PO LIOFILO INJETAVEL
>>>> 1200000UI",
>>>>                        "Penicilina G BENZATINA PO LIOFILO INJETAVEL
>>>> 1200000UI",
>>>>                        "PIPERACILINA SODICA 4G + TAZOBACTAM SODICA 0,5G
>>>> PO
>>>> LIOFILO INJETAVEL"))
>>>>
>>>>
>>>>
>>>>
>>>> results <- regex_left_join(df,
>>>>                            patterns,
>>>>                            by = "name")
>>>>
>>>> head(results)
>>>>
>>>> # Identify with grep() - other way.
>>>> matches  <- unlist(sapply(patterns, function(p) grep(p, df$name,
>>>>                                                      value = FALSE,
>>>>                                                      ignore.case = TRUE)
>>>>                           )
>>>>                    )
>>>>
>>>> anti <- df[matches,]
>>>>
>>>> However, when the name is composed it does not work (for example:
>>>> Piperacillin-tazobactam)
>>>>
>>>> Can anyone help me in this issue?
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-Epi using r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-epi
>>>>
>>>

	[[alternative HTML version deleted]]