[Bioc-devel] rhdf5 help
Joseph Nathaniel Paulson
jpaulson at umiacs.umd.edu
Sat Oct 25 21:07:09 CEST 2014
Hello,
I'm in the process of writing a few wrappers for loading and writing out
files in the biom-format
<http://biom-format.org/documentation/format_versions/biom-2.1.html> that
happens to be in HDF5 format. The rhdf5 package is great, but in
particular, the beginning of every file (as an example:
https://github.com/biocore/biom-format/blob/master/examples/rich_sparse_otu_table_hdf5.biom
)
has missing information that I can get running the command-line version of
hdf5dump
Running hdf5dump vs. 1.8.7 I'm able to see *creation-date*, *format-url*,
*format-version*, etc (see below).
However, running h5read/ls on the same object none of these
categories/groups come up. My goal is to get the format-verson, etc groups
that are not showing up.
Thank you,
Joseph Paulson
*Example:*
*# in R*
*str(h5read("./rich_sparse_otu_table_hdf5.biom","/"))*List of 2
$ observation:List of 4
..$ group-metadata: NULL
..$ ids : chr [1:5(1d)] "GG_OTU_1" "GG_OTU_2" "GG_OTU_3"
"GG_OTU_4" ...
..$ matrix :List of 3
.. ..$ data : num [1:15(1d)] 1 5 1 2 3 1 1 4 2 2 ...
.. ..$ indices: int [1:15(1d)] 2 0 1 3 4 5 2 3 5 0 ...
.. ..$ indptr : int [1:6(1d)] 0 1 6 9 13 15
..$ metadata :List of 1
.. ..$ taxonomy: chr [1:7, 1:5] "k__Bacteria" "p__Proteobacteria"
"c__Gammaproteobacteria" "o__Enterobacteriales" ...
$ sample :List of 4
..$ group-metadata: NULL
..$ ids : chr [1:6(1d)] "Sample1" "Sample2" "Sample3" "Sample4"
...
..$ matrix :List of 3
.. ..$ data : num [1:15(1d)] 5 2 1 1 1 1 1 1 1 2 ...
.. ..$ indices: int [1:15(1d)] 1 3 1 3 4 0 2 3 4 1 ...
.. ..$ indptr : int [1:7(1d)] 0 2 5 9 11 12 15
..$ metadata :List of 4
.. ..$ BODY_SITE : chr [1:6(1d)] "gut" "gut" "gut" "skin" ...
.. ..$ BarcodeSequence : chr [1:6(1d)] "CGCTTATCGAGA" "CATACCAGTAGC"
"CTCTCTACCTGT" "CTCTCGGCCTGT" ...
.. ..$ Description : chr [1:6(1d)] "human gut" "human gut" "human
gut" "human skin" ...
.. ..$ LinkerPrimerSequence: chr [1:6(1d)] "CATGCTGCCTCCCGTAGGAGT"
"CATGCTGCCTCCCGTAGGAGT" "CATGCTGCCTCCCGTAGGAGT" "CATGCTGCCTCCCGTAGGAGT" ...
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1]* rhdf5_2.10.0 * BiocInstaller_1.16.0
loaded via a namespace (and not attached):
[1] tools_3.1.0 zlibbioc_1.12.0
*# Terminal *
*./hdf5-1.8.7-mac-intel-x86_64-static/bin/h5dump
./rich_sparse_otu_table_hdf5.biom *HDF5 "./rich_sparse_otu_table_hdf5.biom"
{
GROUP "/" {
ATTRIBUTE "creation-date" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "2014-07-29T16:16:36.617320"
}
}
ATTRIBUTE "format-url" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "http://biom-format.org"
}
}
ATTRIBUTE "format-version" {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
DATA {
(0): 2, 1
}
}
ATTRIBUTE "generated-by" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "example"
}
}
ATTRIBUTE "id" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "No Table ID"
}
}
ATTRIBUTE "nnz" {
DATATYPE H5T_STD_I64LE
DATASPACE SCALAR
DATA {
(0): 15
}
}
ATTRIBUTE "shape" {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
DATA {
(0): 5, 6
}
}
ATTRIBUTE "type" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "otu table"
}
}
.....
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list