[BioC] rhdf5, dataframes, and variable length strings
John Estrada [guest]
guest at bioconductor.org
Mon Oct 28 22:14:04 CET 2013
Hi all.
I am working with large data frames in R that contain a mix of numbers and variable-length strings. I've tried using the rhdf5 package to write and then read these and I haven't been able to figure out how to correctly use the package. I'll include a toy data frame that causes R to segfault, at least on my machine. I would greatly appreciate either some pointers about what I'm doing wrong or another way to store my data.
rndString <- function(n=1){rndString <- c(1:n);for(i in 1:n){rndString[i] <- paste(sample(c(0:9,letters,LETTERS),sample(c(3:20),1),replace=TRUE),collapse="")};return(rndString)}
library(rhdf5)
n <- 1000000
d <- data.frame(id=seq(n),name=rndString(n),val=rnorm(n),stringsAsFactors=FALSE)
h5createFile("test.h5")
h5write(d,file="test.h5",name="d")
dd <- h5read("test.h5",name="d")
John Estrada
-- output of sessionInfo():
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rhdf5_2.6.0
loaded via a namespace (and not attached):
[1] zlibbioc_1.8.0
--
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list