[R-sig-Epi] Identify medicines names

Leonardo Fontenelle |eon@rdo| @end|ng |rom |eon@rdo|@med@br
Mon Apr 5 21:37:15 CEST 2021


Not sure there is a ready-made solution for you, but you should definitely take a look at "Denominação Comum Brasileira", the official names for them on Brazil: https://www.gov.br/anvisa/pt-br/assuntos/farmacopeia/dcb

There might be a lot of programming challenges, though. There is "maleato de enalapril" but also "enalapril, maleato". If prescribers are able to type the names of the medicines, they will write "anlodipino" in every possible wrong way.

Good luck,

*Leonardo Ferreira Fontenelle*
ORCID iD: 0000-0003-4064-433X <https://orcid.org/0000-0003-4064-433X>
    Twitter: @doutorleonardo <https://twitter.com/doutorleonardo>


Em Seg 5 abr. 2021, às 15:25, Felipe Barletta escreveu:
> Hi friends,
> 
> Hi friends,
> 
> I need to identify medicines names in a data set.
> I have a list of antibiotic names and I need to identify those names in a
> sample.
> 
> When the name of the medicine is simple, my solution worked, see:
> 
> #List of medicines that - object called patterns.
> patterns <- c("Oritavancina", "Oxacilina", "Pefloxacino", "Penicilina",
>               "Pexiganan", "Piperacilina", "Piperacilina-tazobactam",
>               "Pirazinamida", "Plazomicina", "Polimixina B",
>               "Posilozid")
> 
> 
> # Sample Data frame where I need to find the names from the list above.
> df <- data.frame(name =
>                      c("CLORETO DE POTASSIO DRAGEA 600MG",
>                        "CLORETO DE SODIO 0,9% SERINGA PREENCHIDA 5ML",
>                        "CLORETO DE SODIO SOLUCAO INJETAVEL 0,9% 10ML",
>                        "CODEINA FOSFATO SOLUCAO ORAL 3MGML 10ML ISCMPA @",
>                        "CODEINA FOSFATO SOLUCAO ORAL 3MGML 5ML ISCMPA @",
>                        "DipiRONA SOLUCAO INJETAVEL 500MGML 2ML",
>                        "DipiRONA SOLUCAO INJETAVEL 500MGML 2ML",
>                        "FUROSEMIDA SOLUCAO INJETAVEL 10MGML 2ML",
>                        "HIDROCORTISONA SUCCINATO SODICO PO LIOFILO
> INJETAVEL 100MG",
>                        "ONDANSETRONA CLORIDRATO SOLUCAO INJETAVEL 2MGML
> 4ML",
>                        "ONDANSETRONA CLORIDRATO SOLUCAO INJETAVEL 2MGML
> 4ML",
>                        "Penicilina G BENZATINA PO LIOFILO INJETAVEL
> 1200000UI",
>                        "Penicilina G BENZATINA PO LIOFILO INJETAVEL
> 1200000UI",
>                        "PIPERACILINA SODICA 4G + TAZOBACTAM SODICA 0,5G PO
> LIOFILO INJETAVEL"))
> 
> 
> 
> 
> results <- regex_left_join(df,
>                            patterns,
>                            by = "name")
> 
> head(results)
> 
> # Identify with grep() - other way.
> matches  <- unlist(sapply(patterns, function(p) grep(p, df$name,
>                                                      value = FALSE,
>                                                      ignore.case = TRUE)
>                           )
>                    )
> 
> anti <- df[matches,]
> 
> However, when the name is composed it does not work (for example:
> Piperacillin-tazobactam)
> 
> Can anyone help me in this issue?
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-Epi using r-project.org <mailto:R-sig-Epi%40r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-epi
> 
	[[alternative HTML version deleted]]



More information about the R-sig-Epi mailing list