[R] parsing a data file

Eric Lecoutre lecoutre at stat.ucl.ac.be
Tue Apr 27 12:09:42 CEST 2004



Hi,

I dont think there is any built-in function to do that...
Your friend is readLines and some "manual" post-processing.
Here is what I did (not sure it is the best...)

tmptxt = readLines("g:/record.txt")
tmptxt = paste(tmptxt,collapse=" ") # All as a single string
tmptxt = strsplit(tmptxt,"RECORD")[[1]]
tmptxt = tmptxt[-c(1,length(tmptxt))]
num = as.numeric(tmptxt)

which you could transform into a function

readRecords = function(file){
        tmptxt=readLines(file)
         tmptxt = readLines(file)
         tmptxt = paste(tmptxt,collapse=" ") # All as a single string
         tmptxt = strsplit(tmptxt,"RECORD")[[1]]
         tmptxt = tmptxt[-c(1,length(tmptxt))]
         num = as.numeric(tmptxt)
         return(num)
}


Eric

At 11:00 27/04/2004, Tamas Papp wrote:
>Hi,
>
>I need to parse a data file (output of a measuring device) of the
>following format:
>
>BEGIN RECORD [first record data] RECORD [second
>record data] RECORD
>[third record data]
>END
>
>Line breaks can (and do ;-() occur anywhere.  White space behaves very
>much like TeX, eg it is not important whether there are one or more
>spaces or linebreaks as long as there is one of them.  It is a text
>file, not binary.
>
>I need to extract the record data I marked with []'s, eg a vector such
>as c("[first record data]", "[second]", ...) would be nice as a
>result.
>
>What functions should I use for this?
>
>Thanks,
>
>Tamas
>
>
>--
>Tamás K. Papp
>E-mail: tpapp at axelero.hu

Eric Lecoutre
UCL /  Institut de Statistique
Voie du Roman Pays, 20
1348 Louvain-la-Neuve
Belgium

tel: (+32)(0)10473050
lecoutre at stat.ucl.ac.be
http://www.stat.ucl.ac.be/ISpersonnel/lecoutre

If the statistics are boring, then you've got the wrong numbers. -Edward 
Tufte




More information about the R-help mailing list