[R-sig-Epi] Fwd: Identify medicines names
C.H.
ch@|n@@wt|ney @end|ng |rom gm@||@com
Wed Apr 7 10:13:41 CEST 2021
A simple solution is to use text analysis package such as quanteda
require(quanteda)
drug_dictionary <- as.dictionary(data.frame(word = toupper(patterns),
sentiment = patterns))
corpus(df$name) %>% tokens() %>% tokens_compound(drug_dictionary) %>%
dfm %>% dfm_lookup(drug_dictionary) %>% quanteda::convert(to =
"data.frame")
On Tue, Apr 6, 2021 at 4:42 PM Felipe Barletta
<felipe.e.barletta using gmail.com> wrote:
>
> Hi Gianpaolo,
>
> It works now, thank you!
>
> But it is not what I need exactly.
> I will explain better.
>
> Your solution is good. To identify what is antibiotic and for this my
> solution solved too:
>
> ######################################################
> matches <- unlist(sapply(patterns, function(p) grep(p, df$name,
> value = FALSE,
> ignore.case = TRUE)
> )
> )
> anti <- df[matches,]
> ########################################################
>
>
> But what I need, beyond identifying what is an antibiotic:
> - Create a new variable (when the medicine is antibiotic - into the
> patterns object) with the name from patterns name.
> I did this with the code below - fuzzyjoin::regex_left_join() function:
>
> #########################################################
> #List of medicines that - object called patterns.
> patterns <- c("Oritavancina", "Oxacilina", "Pefloxacino", "Penicilina",
> "Pexiganan", "Piperacilina-tazobactam","Tazobactam",
> "Pirazinamida", "Plazomicina", "Polimixina B",
> "Posilozid","Piperacilina")
> patterns <- toupper(patterns)
>
> # Sample Data frame where I need to find the names from the list above.
> df <- data.frame(name =
> c("CLORETO DE POTASSIO DRAGEA 600MG",
> "CLORETO DE SODIO 0,9% SERINGA PREENCHIDA 5ML",
> "CLORETO DE SODIO SOLUCAO INJETAVEL 0,9% 10ML",
> "CODEINA FOSFATO SOLUCAO ORAL 3MGML 10ML ISCMPA @",
> "CODEINA FOSFATO SOLUCAO ORAL 3MGML 5ML ISCMPA @",
> "DipiRONA SOLUCAO INJETAVEL 500MGML 2ML",
> "DipiRONA SOLUCAO INJETAVEL 500MGML 2ML",
> "FUROSEMIDA SOLUCAO INJETAVEL 10MGML 2ML",
> "HIDROCORTISONA SUCCINATO SODICO PO LIOFILO
> INJETAVEL 100MG",
> "ONDANSETRONA CLORIDRATO SOLUCAO INJETAVEL 2MGML
> 4ML",
> "ONDANSETRONA CLORIDRATO SOLUCAO INJETAVEL 2MGML
> 4ML",
> "Penicilina G BENZATINA PO LIOFILO INJETAVEL
> 1200000UI",
> "Penicilina G BENZATINA PO LIOFILO INJETAVEL
> 1200000UI",
> "PIPERACILINA SODICA 4G + TAZOBACTAM SODICA 0,5G PO
> LIOFILO INJETAVEL"))
>
>
> df <- df %>% mutate(name = toupper(name))
> patterns <- data.frame(name = patterns)
> results <- fuzzyjoin::regex_left_join(df,
> patterns,
> by = "name")
> results
> #########################################################
> Notice, from results object, when the name of medicine is double
> (PIPERACILINA SODICA 4G + TAZOBACTAM SODICA 0,5G PO LIOFILO INJETAVEL"),
> the solution doesn't find "PIPERACILINA-TAZOBACTAM"
> The code created two new lines PIPERACILINA and othe with TAZOBACTAM.
>
> I think that this explanation was more clear.
>
>
>
>
>
>
>
>
>
>
> Em ter., 6 de abr. de 2021 às 03:55, Gianpaolo Romeo <
> gianpaolo.romeo using gmail.com> escreveu:
>
> > Sorry,
> > I wrote the code on a smartphone without using R, try this:
> >
> > require(dplyr)
> >
> > patterns <- c("Oritavancina", "Oxacilina", "Pefloxacino", "Penicilina",
> > "Pexiganan", "Piperacilina", "Piperacilina-tazobactam",
> > "Pirazinamida", "Plazomicina", "Polimixina B",
> > "Posilozid")
> >
> > patterns.new <- paste(patterns, collapse = "|")
> >
> >
> > df <- data.frame(name =
> > c("CLORETO DE POTASSIO DRAGEA 600MG",
> > "CLORETO DE SODIO 0,9% SERINGA PREENCHIDA 5ML",
> > "CLORETO DE SODIO SOLUCAO INJETAVEL 0,9% 10ML",
> > "CODEINA FOSFATO SOLUCAO ORAL 3MGML 10ML ISCMPA @",
> > "CODEINA FOSFATO SOLUCAO ORAL 3MGML 5ML ISCMPA @",
> > "DipiRONA SOLUCAO INJETAVEL 500MGML 2ML",
> > "DipiRONA SOLUCAO INJETAVEL 500MGML 2ML",
> > "FUROSEMIDA SOLUCAO INJETAVEL 10MGML 2ML",
> > "HIDROCORTISONA SUCCINATO SODICO PO LIOFILO INJETAVEL
> > 100MG",
> > "ONDANSETRONA CLORIDRATO SOLUCAO INJETAVEL 2MGML 4ML",
> > "ONDANSETRONA CLORIDRATO SOLUCAO INJETAVEL 2MGML 4ML",
> > "Penicilina G BENZATINA PO LIOFILO INJETAVEL
> > 1200000UI",
> > "Penicilina G BENZATINA PO LIOFILO INJETAVEL
> > 1200000UI",
> > "PIPERACILINA SODICA 4G + TAZOBACTAM SODICA 0,5G
> > POLIOFILO INJETAVEL"))
> >
> >
> > results <- df %>% filter(grepl(pattern = patterns.new, x = name,
> > ignore.case = TRUE))
> >
> > Il giorno mar 6 apr 2021 alle ore 02:06 Felipe Barletta <
> > felipe.e.barletta using gmail.com> ha scritto:
> >
> >> Thanks a lotados Gianpaolo, but your suggest didn't work.
> >>
> >> Em seg, 5 de abr de 2021 4:50 PM, Gianpaolo Romeo <
> >> gianpaolo.romeo using gmail.com> escreveu:
> >>
> >>> I suggest you to use dplyr package:
> >>>
> >>>
> >>>
> >>> df %>% mutate(name = toupper(name)) %>%
> >>> filter(grepl(pattern = patterns, name))
> >>>
> >>>
> >>> If you want ti search every string that start exactly with a spedific
> >>> word:
> >>>
> >>> patterns <- paste0("^", patterns))
> >>>
> >>>
> >>> Il lun 5 apr 2021, 20:25 Felipe Barletta <felipe.e.barletta using gmail.com>
> >>> ha scritto:
> >>>
> >>>> Hi friends,
> >>>>
> >>>> Hi friends,
> >>>>
> >>>> I need to identify medicines names in a data set.
> >>>> I have a list of antibiotic names and I need to identify those names in
> >>>> a
> >>>> sample.
> >>>>
> >>>> When the name of the medicine is simple, my solution worked, see:
> >>>>
> >>>> #List of medicines that - object called patterns.
> >>>> patterns <- c("Oritavancina", "Oxacilina", "Pefloxacino", "Penicilina",
> >>>> "Pexiganan", "Piperacilina", "Piperacilina-tazobactam",
> >>>> "Pirazinamida", "Plazomicina", "Polimixina B",
> >>>> "Posilozid")
> >>>>
> >>>>
> >>>> # Sample Data frame where I need to find the names from the list above.
> >>>> df <- data.frame(name =
> >>>> c("CLORETO DE POTASSIO DRAGEA 600MG",
> >>>> "CLORETO DE SODIO 0,9% SERINGA PREENCHIDA 5ML",
> >>>> "CLORETO DE SODIO SOLUCAO INJETAVEL 0,9% 10ML",
> >>>> "CODEINA FOSFATO SOLUCAO ORAL 3MGML 10ML ISCMPA
> >>>> @",
> >>>> "CODEINA FOSFATO SOLUCAO ORAL 3MGML 5ML ISCMPA
> >>>> @",
> >>>> "DipiRONA SOLUCAO INJETAVEL 500MGML 2ML",
> >>>> "DipiRONA SOLUCAO INJETAVEL 500MGML 2ML",
> >>>> "FUROSEMIDA SOLUCAO INJETAVEL 10MGML 2ML",
> >>>> "HIDROCORTISONA SUCCINATO SODICO PO LIOFILO
> >>>> INJETAVEL 100MG",
> >>>> "ONDANSETRONA CLORIDRATO SOLUCAO INJETAVEL 2MGML
> >>>> 4ML",
> >>>> "ONDANSETRONA CLORIDRATO SOLUCAO INJETAVEL 2MGML
> >>>> 4ML",
> >>>> "Penicilina G BENZATINA PO LIOFILO INJETAVEL
> >>>> 1200000UI",
> >>>> "Penicilina G BENZATINA PO LIOFILO INJETAVEL
> >>>> 1200000UI",
> >>>> "PIPERACILINA SODICA 4G + TAZOBACTAM SODICA 0,5G
> >>>> PO
> >>>> LIOFILO INJETAVEL"))
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> results <- regex_left_join(df,
> >>>> patterns,
> >>>> by = "name")
> >>>>
> >>>> head(results)
> >>>>
> >>>> # Identify with grep() - other way.
> >>>> matches <- unlist(sapply(patterns, function(p) grep(p, df$name,
> >>>> value = FALSE,
> >>>> ignore.case = TRUE)
> >>>> )
> >>>> )
> >>>>
> >>>> anti <- df[matches,]
> >>>>
> >>>> However, when the name is composed it does not work (for example:
> >>>> Piperacillin-tazobactam)
> >>>>
> >>>> Can anyone help me in this issue?
> >>>>
> >>>> [[alternative HTML version deleted]]
> >>>>
> >>>> _______________________________________________
> >>>> R-sig-Epi using r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-epi
> >>>>
> >>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Epi using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-epi
More information about the R-sig-Epi
mailing list