[R] hdf5 package segfault when processing large data
William Dunlap
wdunlap at tibco.com
Mon Aug 24 19:41:28 CEST 2009
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Budi Mulyono
> Sent: Monday, August 24, 2009 3:38 AM
> To: r-help at r-project.org
> Subject: [R] hdf5 package segfault when processing large data
>
> Hi there,
>
> I am currently working on something that uses hdf5 library. I think
> hdf5 is a great data format, I've used it somewhat extensively in
> python via PyTables. I was looking for something similar to that in R.
> The closest I can get is this library: hdf5. While it does not work
> the same way as PyTables did, but it's good enough to let them
> exchange data via hdf5 file.
>
> There is just 1 problem, I keep getting Segfault error when trying to
> process large files (>10MB), although this is by no mean large when we
> talk about hdf5 capabilities. I have included the example code and
> data below. I have tried with different OS (WinXP and Ubuntu 8.04),
> architecture (32 and 64bit) and R versions (2.7.1, 2.72, and 2.9.1),
> but all of them present the same problem. I was wondering if anyone
> have any clue as to what's going on here and maybe can advice me to
> handle it.
This sort of problem should be sent to the package's maintainer.
> packageDescription("hdf5")
Package: hdf5
Version: 1.6.9
Title: HDF5
Author: Marcus G. Daniels mdaniels at lanl.gov
Maintainer: Marcus G. Daniels <mdaniels at lanl.gov>
Description: Interface to the NCSA HDF5 library
...
This is probably due to the code in hdf5.c allocating a huge
matrix, buf, on the stack with
883 unsigned char buf[rowcount][size];
It dies with the segmentatio fault (stack overflow, in particular)
at line 898, where it tries to access this buf.
885 for (ri = 0; ri < rowcount; ri++)
886 for (pos = 0; pos < colcount; pos++)
887 {
888 SEXP item = VECTOR_ELT (val, pos);
889 SEXPTYPE type = TYPEOF (item);
890 void *ptr = &buf[ri][offsets[pos]];
891
892 switch (type)
893 {
894 case REALSXP:
895 memcpy (ptr, &REAL (item)[ri], sizeof
(double));
896 break;
897 case INTSXP:
898 memcpy (ptr, &INTEGER (item)[ri], sizeof
(int));
899 break;
The code should use one of the allocators in the R API instead
of putting the big memory block on the stack.
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
>
> Thank you, appreciate any help i can get.
>
> Cheers,
>
> Budi
>
> The example script
> ====================
> library(hdf5)
> fileName <- "sample.txt"
> myTable <- read.table(fileName,header=TRUE,sep="\t",as.is=TRUE)
> hdf5save("test.hdf", "myTable")
>
> ========
> The data example, the list continue for more than 250,000
> rows: sample.txt
> ========
> Date Time f1 f2 f3 f4 f5
> 20070328 07:56 463 463.07 462.9 463.01 1100
> 20070328 07:57 463.01 463.01 463.01 463.01 200
> ....
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list