[BioC] rhdf5, dataframes, and variable length strings

John Estrada [guest] guest at bioconductor.org
Mon Oct 28 22:14:04 CET 2013


Hi all.

I am working with large data frames in R that contain a mix of numbers and variable-length strings.  I've tried using the rhdf5 package to write and then read these and I haven't been able to figure out how to correctly use the package.  I'll include a toy data frame that causes R to segfault, at least on my machine.  I would greatly appreciate either some pointers about what I'm doing wrong or another way to store my data.

rndString <- function(n=1){rndString <- c(1:n);for(i in 1:n){rndString[i] <- paste(sample(c(0:9,letters,LETTERS),sample(c(3:20),1),replace=TRUE),collapse="")};return(rndString)}
library(rhdf5)
n <- 1000000
d <- data.frame(id=seq(n),name=rndString(n),val=rnorm(n),stringsAsFactors=FALSE)
h5createFile("test.h5")
h5write(d,file="test.h5",name="d")
dd <- h5read("test.h5",name="d")

John Estrada



 -- output of sessionInfo(): 

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rhdf5_2.6.0

loaded via a namespace (and not attached):
[1] zlibbioc_1.8.0


--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list