[R] parsing text files

jim holtman jholtman at gmail.com
Fri Mar 9 14:33:47 CET 2012


Here is one way of doing it; it reads the file and create a 'long' version.

##########
input <- file("/temp/ClinicalReports.txt", 'r')
outFile <- '/temp/output.txt'  #  tempfile()
output <- file(outFile, 'w')
writeLines("ID, Date, variable, value", output)
ID <- NULL
dataSw <- NULL
repeat{
    line <- readLines(input, n = 1)
    if (length(line) == 0) break
    if (!is.null(dataSw)){
        if (line == ''){  # end of data
            ID <- NULL
            dataSw <- NULL
            next
        }
        # now write CSV output file
        cat(ID
          , ','
          , Date
          , ','
          , substring(line, 1, 31)
          , ','
          , substring(line, 32, 43)
          , '\n'
          , sep = ''
          , file = output
          )
        next
    }
    if (grepl("Acc.ne", line)){
        ID <- (substring(line, 29,35))
        Date <- (substring(line, 52,61))
        next
    }
    if (!is.null(ID)){  # looking for Esame
        if (grepl("Esame", line)){
            # skip two lines
            readLines(input, n = 2)
            dataSw <- 1
            next
        }
    }

}

# now read in the data in a long format
close(output)
result <- read.csv(outFile, as.is = TRUE)


the results from your test data is:

> str(result)
'data.frame':   43 obs. of  4 variables:
 $ ID      : int  185 185 185 185 185 185 185 185 185 185 ...
 $ Date    : chr  "05/12/2011" "05/12/2011" "05/12/2011" "05/12/2011" ...
 $ variable: chr  "AZOTEMIA                       " "CREATININEMIA
             " "SODIEMIA                       " "POTASSIEMIA
          " ...
 $ value   : num  33.6 0.99 136 4.22 94.2 8.68 1.87 1.79 189 118 ...
> head(result)
   ID       Date                        variable  value
1 185 05/12/2011 AZOTEMIA                         33.60
2 185 05/12/2011 CREATININEMIA                     0.99
3 185 05/12/2011 SODIEMIA                        136.00
4 185 05/12/2011 POTASSIEMIA                       4.22
5 185 05/12/2011 CLOREMIA                         94.20
6 185 05/12/2011 CALCEMIA                          8.68
>


On Thu, Mar 8, 2012 at 8:24 AM, ginger <biino at igm.cnr.it> wrote:
> Ooops,
> I forgot to specify that for each raw, containing records of the clinical
> reports , the values  of the 22 parameter measurement have to be reported.
> For example, first raw, first 5 columns:
> ID                  DATE                  GLICEMIA   AZOTEMIA
> CREATININEMIA    SODIEMIA  ...        ...      ...
> 0000185      05/12/2011        115              33.6                  0.99
> 136             ...        ...      ...
>
> --
> View this message in context: http://r.789695.n4.nabble.com/parsing-text-files-tp4456355p4456389.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list