[R] reading heterogeneous CSV
Allen S. Rout
asr at ufl.edu
Tue Aug 11 20:55:06 CEST 2009
Greetings, all.
I've got a datafile I've been working with that has an ideosyncratic,
heterogeneous format. It's grossly like:
[...]
DISKREAD,metadata about disks
MEM,metadata about memory
ZZZZ,observation-identifier,time,date
DISKREAD,observation-identifier,data about disks
MEM,observation-identifier,data about memory
[ and repeat for each observation ]
What I've done in the past was take the monolithic file, and
preprocess it into files, one per observation type. The observation
types are structurally self-similar, so once I have them split up,
normal read.csv methods work just fine. Then I read the ZZZZ file to
get timestamps, and whichever observation files I care about on this
run.
But ideally, I'd like to do this entire operation with R features, and
without multiple passes through the file.
The line lengths vary wildly, so a read.table doesn't help.
I was visualizing the following:
+ create a FIFO for each desired observation class, including the ZZZZ metadata
+ In one pass through the source file, populate the FIFOs with their data
+ read.csv the output sides of the FIFOs.
But I have problems right out of the gate: when I set a data.frame
element to the output of fifo(), what actually gets inserted seems to
be an integer; I am guessing it's being turned into a factor.
example:
----
desired_slices=c("ZZZZ","DISKWRITE")
temps = data.frame(slice=desired_slices,row.names=1,handle=I(""))
temps["ZZZZ",] = fifo("./ZZZZ",open="w+")
showConnections()
( you can see that the connection is open)
temps
( you can see that the contents of the data.frame cell is the filehandle number)
-----
Am I just barking up the wrong tree?
- Allen S. Rout
More information about the R-help
mailing list