[R] hdf5 package segfault when processing large data

Budi Mulyono budi.mulyono at alvantage.com
Mon Aug 24 12:37:53 CEST 2009


Hi there,

I am currently working on something that uses hdf5 library. I think
hdf5 is a great data format, I've used it somewhat extensively in
python via PyTables. I was looking for something similar to that in R.
The closest I can get is this library: hdf5. While it does not work
the same way as PyTables did, but it's good enough to let them
exchange data via hdf5 file.

There is just 1 problem, I keep getting Segfault error when trying to
process large files (>10MB), although this is by no mean large when we
talk about hdf5 capabilities. I have included the example code and
data below. I have tried with different OS (WinXP and Ubuntu 8.04),
architecture (32 and 64bit) and R versions (2.7.1, 2.72, and 2.9.1),
but all of them present the same problem. I was wondering if anyone
have any clue as to what's going on here and maybe can advice me to
handle it.

Thank you, appreciate any help i can get.

Cheers,

Budi

The example script
====================
library(hdf5)
fileName <- "sample.txt"
myTable <- read.table(fileName,header=TRUE,sep="\t",as.is=TRUE)
hdf5save("test.hdf", "myTable")

========
The data example, the list continue for more than 250,000 rows: sample.txt
========
Date	Time	f1	f2	f3	f4	f5
20070328	07:56	463	463.07	462.9	463.01	1100
20070328	07:57	463.01	463.01	463.01	463.01	200
....




More information about the R-help mailing list