[R] reading heterogeneous CSV

Allen S. Rout asr at ufl.edu
Tue Aug 11 20:55:06 CEST 2009



Greetings, all.

I've got a datafile I've been working with that has an ideosyncratic,
heterogeneous format.  It's grossly like:


[...]
DISKREAD,metadata about disks
MEM,metadata about memory

ZZZZ,observation-identifier,time,date
DISKREAD,observation-identifier,data about disks
MEM,observation-identifier,data about memory

[ and repeat for each observation ]

What I've done in the past was take the monolithic file, and
preprocess it into files, one per observation type.  The observation
types are structurally self-similar, so once I have them split up,
normal read.csv methods work just fine.  Then I read the ZZZZ file to
get timestamps, and whichever observation files I care about on this
run.


But ideally, I'd like to do this entire operation with R features, and
without multiple passes through the file.

The line lengths vary wildly, so a read.table doesn't help.


I was visualizing the following: 

+ create a FIFO for each desired observation class, including the ZZZZ metadata
+ In one pass through the source file, populate the FIFOs with their data
+ read.csv the output sides of the FIFOs. 


But I have problems right out of the gate: when I set a data.frame
element to the output of fifo(), what actually gets inserted seems to
be an integer; I am guessing it's being turned into a factor.  


example: 
----
desired_slices=c("ZZZZ","DISKWRITE")
temps = data.frame(slice=desired_slices,row.names=1,handle=I(""))

temps["ZZZZ",] = fifo("./ZZZZ",open="w+")        
showConnections()
 ( you can see that the connection is open)
temps
 ( you can see that the contents of the data.frame cell is the filehandle number)
-----

Am I just barking up the wrong tree? 



- Allen S. Rout




More information about the R-help mailing list