[Bioc-devel] Printing DataFrame with nested data.frame/DataFrame/DataFrameList
Jialin Ma
marlin- at gmx.cn
Thu Sep 28 10:19:03 CEST 2017
Dear all,
I have a package in reviewing at
https://github.com/Bioconductor/Contributions/issues/487, in which I
would like to use a GRanges with nested data.frame or DataFrameList to
represent the track data internally.
However, the default show method does not seem to work well with such
structures.
I have an example for GRanges in which one meta-column is a one-column
data frame:
gr <- GRanges("chr21", IRanges(1:5, width = 1))
gr$df <- data.frame(x = 1:5)
show(gr)
GRanges object with 5 ranges and 1 metadata column:
Error in .Method(..., deparse.level = deparse.level) :
number of rows of matrices must match (see arg 3)
However, if the nested data frame has two columns, it can be printed
out correctly:
gr <- GRanges("chr21", IRanges(1:5, width = 1))
gr$df <- data.frame(x = 1:5, y = 11:15)
show(gr)
GRanges object with 5 ranges and 1 metadata column:
seqnames ranges strand | df
<Rle> <IRanges> <Rle> | <data.frame>
[1] chr21 [1, 1] * | 1:11
[2] chr21 [2, 2] * | 2:12
[3] chr21 [3, 3] * | 3:13
[4] chr21 [4, 4] * | 4:14
[5] chr21 [5, 5] * | 5:15
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
In some cases, it can be printed with a warning message, but the form
is wrong:
gr <- GRanges("chr21", IRanges(1:5, width = 1), emm = 6:10)
gr$df <- data.frame(x = 1:5)
show(gr)
# The nested df is not printed with correct format, there is only
# one column in the nested df.
GRanges object with 5 ranges and 2 metadata columns:
seqnames ranges strand | emm df
<Rle> <IRanges> <Rle> | <integer> <data.frame>
[1] chr21 [1, 1] * | 6 1,2,3,...
[2] chr21 [2, 2] * | 7 1,2,3,...
[3] chr21 [3, 3] * | 8 1,2,3,...
[4] chr21 [4, 4] * | 9 1,2,3,...
[5] chr21 [5, 5] * | 10 1,2,3,...
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
Warning message:
In (function (..., row.names = NULL, check.rows = FALSE, check.names
= TRUE, :
row names were found from a short variable and have been discarded
Nested DataFrameList can not be printed:
DF <- DataFrame(x = 1:2)
DF$split = split(DataFrame(aa = 1:4), c(1,1,2,2))
show(DF)
DataFrame with 2 rows and 2 columns
Error in dim(object) <- c(nrow(object), prod(tail(dim(object), -1)))
:
invalid first argument
class(DF$split)
[1] "CompressedSplitDataFrameList"
attr(,"package")
[1] "IRanges"
In the case above, I understand that it is hard to create a short
string representation of the nested structure, but I think printing
dimensions of the nested element may be sufficient.
Any comments?
Best,
Jialin
-----------
Session Info:
R version 3.4.1 (2017-06-30)
Platform: x86_64-suse-linux-gnu (64-bit)
Running under: openSUSE Tumbleweed
Matrix products: default
BLAS: /usr/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices
utils datasets
[8] methods base
other attached packages:
[1] Biobase_2.37.2 GenomicRanges_1.29.14 GenomeInfoDb_1.13.4
[4] IRanges_2.11.17 S4Vectors_0.15.8 BiocGenerics_0.23.1
[7] magrittr_1.5
loaded via a namesp
r$> DF$split <- DF$split %>% as.list %>%
lapply(as.data.frame)
r$>
DF
DataFrame with 2 rows and 2 columns
x split
<integer> <list>
1 1 1,2
2 2 3,4
ace (and not attached):
[1]
zlibbioc_1.23.0 compiler_3.4.1 XVector_0.17.1
[4] tools_3.4.1 GenomeInfoDbData_0.99.1 RCurl_1.95-
4.8
[7] ulimit_0.0-3 bitops_1.0-6
More information about the Bioc-devel
mailing list