[R] reading heterogeneous CSV
ggrothendieck at gmail.com
Wed Aug 12 04:53:59 CEST 2009
This will read it in all in and then you can decide
what you want to do with it:
Lines <- "DISKREAD,metadata about disks
MEM,metadata about memory
DISKREAD,observation-identifier,data about disks
MEM,observation-identifier,data about memory"
DF <- read.table(textConnection(Lines), sep = ",", fill = TRUE)
On Tue, Aug 11, 2009 at 2:55 PM, Allen S. Rout<asr at ufl.edu> wrote:
> Greetings, all.
> I've got a datafile I've been working with that has an ideosyncratic,
> heterogeneous format. It's grossly like:
> DISKREAD,metadata about disks
> MEM,metadata about memory
> DISKREAD,observation-identifier,data about disks
> MEM,observation-identifier,data about memory
> [ and repeat for each observation ]
> What I've done in the past was take the monolithic file, and
> preprocess it into files, one per observation type. The observation
> types are structurally self-similar, so once I have them split up,
> normal read.csv methods work just fine. Then I read the ZZZZ file to
> get timestamps, and whichever observation files I care about on this
> But ideally, I'd like to do this entire operation with R features, and
> without multiple passes through the file.
> The line lengths vary wildly, so a read.table doesn't help.
> I was visualizing the following:
> + create a FIFO for each desired observation class, including the ZZZZ metadata
> + In one pass through the source file, populate the FIFOs with their data
> + read.csv the output sides of the FIFOs.
> But I have problems right out of the gate: when I set a data.frame
> element to the output of fifo(), what actually gets inserted seems to
> be an integer; I am guessing it's being turned into a factor.
> temps = data.frame(slice=desired_slices,row.names=1,handle=I(""))
> temps["ZZZZ",] = fifo("./ZZZZ",open="w+")
> ( you can see that the connection is open)
> ( you can see that the contents of the data.frame cell is the filehandle number)
> Am I just barking up the wrong tree?
> - Allen S. Rout
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help