[R] Trouble pulling data from a messy ASCII file...
jim holtman
jholtman at gmail.com
Fri Dec 19 03:49:18 CET 2008
Here is an example of some code that might do it for you::
> input <- readLines(textConnection("19 c:/data/WF-100/2008/20080911/trk/20080911.013115.007.17.txt
+ 10 s name of program that wrote this file trkplt name of program
that wrote this file
+ 10 GORDON machine that generated this file machine that
generated this file
+ 10 3.7 version of program
+ 10 3.6 version of this data file
+ 10 5.81 version of Universal Library
+ 10 20081121.145730 when this file was written
+ 10 Windows_XP operating system used operating system used
+ *
+ * radar characteristics
+ 11 WF-100
+ 11 20000000 A/D rate, samples/second
+ 11 7.5 bin width, m
+ 11 800 nominal PRF, Hz
+ 11 0.25 nominal pulse width, microsec
+ 11 0 tuning, volts
+ 11 3.19779 nominal wave length, cm"))
> closeAllConnections()
>
> # parse out the data
> f.parse <- function(line){
+ x <- sub("^(\\S+)\\s+(\\S+)\\s*(.*)", "\\1`\\2`\\3", line)
+ unlist(strsplit(x, "`"))
+ }
>
> fileName <- ''
> result <- NULL
> for (i in input){
+ values <- f.parse(i)
+ switch(values[1],
+ '19'={fileName <<- values[2]},
+ '*'=NULL, # ignore comments
+ '10'=,
+ '11'={result <<- rbind(result, c(fileName, values[3], values[2]))}
+ )
+ }
> # convert to dataframe for 'melt'
> result <- as.data.frame(result, stringsAsFactors=FALSE)
> names(result) <- c('fileName', 'variable', 'value')
> require(reshape)
> cast(result, fileName ~ variable, c)
fileName A/D
rate, samples/second bin width, m
1 c:/data/WF-100/2008/20080911/trk/20080911.013115.007.17.txt
20000000 7.5
machine that generated this file machine that generated this file
1 GORDON
name of program that wrote this file trkplt name of program that
wrote this file nominal PRF, Hz
1
s 800
nominal pulse width, microsec nominal wave length, cm operating
system used operating system used
1 0.25 3.19779
Windows_XP
tuning, volts version of program version of this data file version
of Universal Library
1 0 3.7 3.6
5.81
when this file was written NA
1 20081121.145730 WF-100
>
>
On Wed, Dec 17, 2008 at 12:21 PM, Titan8883 <jplaney at gmail.com> wrote:
>
> The output I would be looking for would be one row for each data file with
> columns for each variable, so using a .csv example with a few variables
> would be:
> -------------------------------------------------------------------------
> File_name,date_written,program_ver,data_file_ver,bin_width
> 20080911.013115.007.17.txt, 20081121.145730,3.7,3.6,7.5
> --------------------------------------------------------------------------
> My plan is to create a table with all the data files listed. This would
> allow me to find mean/min/max values for different variables,sort by a
> certain variable, etc. I am not limiting myself to R, I have seen awk
> mentioned before, so that sounds like it is worth looking at to prep the
> data.
>
> Hope that helps.
>
>
>
>
>
> jholtman wrote:
>>
>> It would be helpful if you could show what the output would be for the
>> example given. Exactly what are 'values' and what would be the
>> 'headings'. As mentioned before, you can use readLines and then parse
>> the data you want, but something like Perl might be easier, but it is
>> hard to tell from the mail.
>>
>> On Wed, Dec 17, 2008 at 2:37 PM, Titan8883 <jplaney at gmail.com> wrote:
>>>
>>> Hi all,
>>>
>>> I am a new graduate student who is also new to R. I am ok with the
>>> basics,
>>> but the problem I am having right now seems beyond what I can do..so I am
>>> looking for advice. I am trying to pull data from flat ASCII files, but
>>> they
>>> do not have a "nice" structure so a simple "read.table" doesn't work. An
>>> example first half of a data file is below:
>>> ----------------------------------------------------------------------------------------------
>>> 19 c:/data/WF-100/2008/20080911/trk/20080911.013115.007.17.txt
>>> 10 s name of program that wrote this file trkplt name of program that
>>> wrote this file
>>> 10 GORDON machine that generated this file machine that generated
>>> this
>>> file
>>> 10 3.7 version of program
>>> 10 3.6 version of this data file
>>> 10 5.81 version of Universal Library
>>> 10 20081121.145730 when this file was written
>>> 10 Windows_XP operating system used operating system used
>>> *
>>> * radar characteristics
>>> 11 WF-100
>>> 11 20000000 A/D rate, samples/second
>>> 11 7.5 bin width, m
>>> 11 800 nominal PRF, Hz
>>> 11 0.25 nominal pulse width, microsec
>>> 11 0 tuning, volts
>>> 11 3.19779 nominal wave length, cm
>>> -----------------------------------------------------------------------------------------------
>>> ..the file goes on from there...
>>>
>>> How would I go about getting this data into some kind of useful format?
>>> This
>>> is one of about 1000 files I will need to go through. I would ideally
>>> like
>>> to get these into a format with each data file as a row with columns for
>>> the
>>> various values with the description text removed(version of program, file
>>> version, tuning volts, etc...).
>>>
>>> I'm not looking for a cut and paste answer, but perhaps some direction on
>>> where I should start. I have only done basic .csv, table, and line inputs
>>> up
>>> until now.
>>>
>>> Thanks for any advice
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Trouble-pulling-data-from-a-messy-ASCII-file...-tp21059239p21059239.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Trouble-pulling-data-from-a-messy-ASCII-file...-tp21059239p21060639.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list