[R] Correcting dates in research / medical record using R

PIKAL Petr petr@p|k@| @end|ng |rom prechez@@cz
Thu Sep 13 09:53:48 CEST 2018


Hi

You should send your responses to R helplist, others could offer better/different solutions.

I myself am not an expert for regex so if all your files are formated in the same way I would use strsplit.

# I read header to test object
test<-readLines("clipboard")
str(test)
 chr [1:4] "PATIENT NAME: CONFIDENTIAL,#12345" "PATIENT ID #: 12345" ...

# here is something similar to your csv file
test2<-read.table("clipboard")
test2
    Id1   Id2 VisitDate
1 12345 12345  4/3/2018
2 11111 11111  5/4/2018

# here I split second line of patient record, select 4th item and compare with Id2 value from csv file.

sel<-which(test2$Id2 == as.numeric(unlist(strsplit(test[2], " "))[4]))

# I take third line of patient record and split it
out<-unlist(strsplit(test[3], split=" "))

# and change 4th item with selected value from csv VisitDate
out[4] <- as.character(test2$VisitDate[sel])

# here you should be aware of difference between factors and characters
# and finally make collapsed line, which could be used to change third line in patient record
paste(out, collapse=" ")
[1] "DATE OF SERVICE: 4/3/2018"

But what you want to do with it? It actually manipulates objects in your R session and not original files. I believe that there are other tools more suitable for such tasks.

Cheers
Petr

> -----Original Message-----
> From: Nicola Cecchino <ncecchino using gmail.com>
> Sent: Thursday, September 13, 2018 5:04 AM
> To: PIKAL Petr <petr.pikal using precheza.cz>
> Subject: Re: [R] Correcting dates in research / medical record using R
>
> Hi Petr,
>
> Thank you for your help but I'm not sure what that code is supposed to do?  I'm
> really new to regular expressions and am having difficulties with this whole
> thing.
>
> Nic
>
>
>
>
> On 9/12/2018 2:26 AM, PIKAL Petr wrote:
> > Hi
> >
> > First of all you should not use HTML formated posts, it is big chance that it
> gets scrambled.
> >
> > You should compare your ld2 after for cycle and result of
> >
> > clinicVdate[Id2, 'VisitDate'], sep=':')
> >
> > Most probably ld2 after for cycle does not conform to row names of
> clinicVdate.
> >
> > Cheers
> > Petr
> >
> >
> >> -----Original Message-----
> >> From: R-help <r-help-bounces using r-project.org> On Behalf Of Nicola
> >> Cecchino
> >> Sent: Wednesday, September 12, 2018 3:50 AM
> >> To: R-help using r-project.org
> >> Subject: [R] Correcting dates in research / medical record using R
> >>
> >> Hi,
> >>
> >> I'm not that well versed with R - I'm trying to correct the dates of
> >> service in a de-identified research medical record of several subjects.
> >> The correct dates come from a csv file, in the VisitDate column,
> >> that looks like this in Excel.  The empty cells have other data in
> >> them that I don't need and the  file name is DateR.csv:
> >>
> >>
> >> Id1 Id2
> >>
> >>
> >>
> >>
> >> VisitDate
> >> 12345 12345
> >>
> >>
> >>
> >>
> >> 4/3/2018
> >>
> >>
> >> The research medical record is a text file and the "DATE OF SERVICE"
> >> in the top matter is in error in all of the subjects and needs to be
> >> replaced with the "VisitDate" in the csv file.  The file name for the
> >> medical records is test3.NEW.  Here is a screen grab of the top
> >> matter of the research medical record; below this data excerpt is
> >> other gathered data for that subject:
> >>
> >>
> >>
> ===================================================================
> >> =============
> >>
> >> PATIENT NAME: CONFIDENTIAL,#12345
> >> PATIENT ID #: 12345
> >> DATE OF SERVICE: 04/10/2018
> >> ACCESSION NUMBER: RR1234567
> >>
> >> TEST PROCEDURE        HIGH/LOW  TEST RESULTS       UNITS NORMAL VALUES
> >>
> >>
> >> As described above, I need to update the text file DATE OF SERVICE:
> >> date with the VisitDate in the csv file.
> >>
> >> I made several attempts at this to failure and so now I turn to you.
> >> Here is the code that exhibits my attempts:
> >>
> >>
> >> clinicVdate <- read.csv("DateR.csv")
> >>
> >> rownames(clinicVdate) <- as.character(clinicVdate[,'Id2'])
> >>
> >> Id2 <- NA
> >>
> >> input_data <- readLines("D:/test/test3.NEW") output_data <- c()
> >>
> >> for(input_line in input_data){
> >>     output_line = input_line
> >>     if(length(grep('PATIENT ID #:', input_line))>0)  {
> >>       Id2 = as.character(strsplit(input_line, ':')[[1]][2])
> >>     }
> >>
> >>     if (length(grep( 'DATE OF SERVICE: ', input_line))){
> >>
> >>       output_line = paste('DATE OF SERVICE', clinicVdate[Id2,
> >> 'VisitDate'], sep=':')
> >>
> >>     }
> >>     output_data = paste(output_data, output_line, sep='\n') }
> >>
> >> cat(output_data)
> >>
> >>
> >> The results of the above remove the erroneous date and replace it
> >> with NA.  Here is an example of the results:
> >>
> >>
> >>
> ===================================================================
> >> =============
> >>
> >> PATIENT NAME: CONFIDENTIAL,#12345
> >> PATIENT ID #: 12345
> >> DATE OF SERVICE: NA
> >> ACCESSION NUMBER: RR1234567
> >>
> >> TEST PROCEDURE        HIGH/LOW  TEST RESULTS       UNITS NORMAL VALUES
> >>
> >>
> >> Where am I going wrong?  If I didn't pose my question appropriately,
> >> please let me know too!!  Any help with this would be greatly appreciated!!
> >>
> >> Kind regards,
> >>
> >> Nic Cecchino
> >>
> >>
> >>
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> > Osobní údaje: Informace o zpracování a ochraně osobních údajů
> > obchodních partnerů PRECHEZA a.s. jsou zveřejněny na:
> > https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information
> > about processing and protection of business partner’s personal data
> > are available on website:
> > https://www.precheza.cz/en/personal-data-protection-principles/
> > Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou
> > důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení
> > odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any
> > documents attached to it may be confidential and are subject to the
> > legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
> >

Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/



More information about the R-help mailing list