[Bioc-devel] Printing DataFrame with nested data.frame/DataFrame/DataFrameList
Jialin Ma
marlin- at gmx.cn
Thu Sep 28 23:19:55 CEST 2017
Hi Hervé,
Thanks for addressing it so quickly, I will check it when the new
version if available for biocLite().
Thanks!
Jialin
On Thu, 2017-09-28 at 13:47 -0700, Hervé Pagès wrote:
> Hi Jialin,
>
> Thanks for the excellent report. These "show" methods like
> many others in Bioconductor, rely on low-level helper showAsCell()
> which was not working properly on data-frame-like or array-like
> objects with a single column, or on SplitDataFrameList objects.
>
> This should now be addressed. The fix is in S4Vectors 0.14.5
> (release) and 0.15.10 (devel). Both should become available
> via biocLite() in about 24 hours.
>
> Let us know if you still see "show" problems after you update.
>
> Thanks,
> H.
>
> On 09/28/2017 01:19 AM, Jialin Ma wrote:
> > Dear all,
> >
> > I have a package in reviewing at
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bio
> > conductor_Contributions_issues_487&d=DwICAg&c=eRAMFD45gAfqt84VtBcfh
> > Q&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=npFXtfKAjVRDigSzn
> > tYatjIYfWrBU30MFNqbP6u8Njg&s=P6CWpnkqCx0GPBTlw7QD2gGs_Lc3c063in1J_F
> > 4vvDY&e=, in which I
> > would like to use a GRanges with nested data.frame or DataFrameList
> > to
> > represent the track data internally.
> >
> > However, the default show method does not seem to work well with
> > such
> > structures.
> >
> > I have an example for GRanges in which one meta-column is a one-
> > column
> > data frame:
> >
> > gr <- GRanges("chr21", IRanges(1:5, width = 1))
> > gr$df <- data.frame(x = 1:5)
> > show(gr)
> >
> > GRanges object with 5 ranges and 1 metadata column:
> > Error in .Method(..., deparse.level = deparse.level) :
> > number of rows of matrices must match (see arg 3)
> >
> > However, if the nested data frame has two columns, it can be
> > printed
> > out correctly:
> >
> > gr <- GRanges("chr21", IRanges(1:5, width = 1))
> > gr$df <- data.frame(x = 1:5, y = 11:15)
> > show(gr)
> >
> > GRanges object with 5 ranges and 1 metadata column:
> > seqnames ranges strand | df
> > <Rle> <IRanges> <Rle> | <data.frame>
> > [1] chr21 [1, 1] * | 1:11
> > [2] chr21 [2, 2] * | 2:12
> > [3] chr21 [3, 3] * | 3:13
> > [4] chr21 [4, 4] * | 4:14
> > [5] chr21 [5, 5] * | 5:15
> > -------
> > seqinfo: 1 sequence from an unspecified genome; no
> > seqlengths
> >
> > In some cases, it can be printed with a warning message, but the
> > form
> > is wrong:
> >
> > gr <- GRanges("chr21", IRanges(1:5, width = 1), emm = 6:10)
> > gr$df <- data.frame(x = 1:5)
> > show(gr)
> >
> > # The nested df is not printed with correct format, there is
> > only
> > # one column in the nested df.
> >
> > GRanges object with 5 ranges and 2 metadata columns:
> > seqnames ranges strand | emm df
> > <Rle> <IRanges> <Rle> | <integer> <data.frame>
> > [1] chr21 [1, 1] * | 6 1,2,3,...
> > [2] chr21 [2, 2] * | 7 1,2,3,...
> > [3] chr21 [3, 3] * | 8 1,2,3,...
> > [4] chr21 [4, 4] * | 9 1,2,3,...
> > [5] chr21 [5, 5] * | 10 1,2,3,...
> > -------
> > seqinfo: 1 sequence from an unspecified genome; no
> > seqlengths
> > Warning message:
> > In (function (..., row.names = NULL, check.rows = FALSE,
> > check.names
> > = TRUE, :
> > row names were found from a short variable and have been
> > discarded
> >
> > Nested DataFrameList can not be printed:
> >
> > DF <- DataFrame(x = 1:2)
> > DF$split = split(DataFrame(aa = 1:4), c(1,1,2,2))
> > show(DF)
> >
> > DataFrame with 2 rows and 2 columns
> > Error in dim(object) <- c(nrow(object), prod(tail(dim(object),
> > -1)))
> > :
> > invalid first argument
> >
> > class(DF$split)
> >
> > [1] "CompressedSplitDataFrameList"
> > attr(,"package")
> > [1] "IRanges"
> >
> > In the case above, I understand that it is hard to create a
> > short
> > string representation of the nested structure, but I think
> > printing
> > dimensions of the nested element may be sufficient.
> >
> > Any comments?
> >
> > Best,
> > Jialin
> >
> > -----------
> > Session Info:
> >
> > R version 3.4.1 (2017-06-30)
> > Platform: x86_64-suse-linux-gnu (64-bit)
> > Running under: openSUSE Tumbleweed
> >
> > Matrix products: default
> > BLAS: /usr/lib64/R/lib/libRblas.so
> > LAPACK: /usr/lib64/R/lib/libRlapack.so
> >
> > locale:
> > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> > [9] LC_ADDRESS=C LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats4 parallel stats graphics grDevices
> > utils datasets
> > [8] methods base
> >
> > other attached packages:
> > [1] Biobase_2.37.2 GenomicRanges_1.29.14
> > GenomeInfoDb_1.13.4
> > [4]
> > IRanges_2.11.17 S4Vectors_0.15.8 BiocGenerics_0.23.1
> > [7] magrittr_1.5
> >
> > loaded via a namesp
> >
> > r$> DF$split <- DF$split %>% as.list %>%
> > lapply(as.data.frame)
> >
> > r$>
> > DF
> >
> > DataFrame with 2 rows and 2 columns
> > x split
> > <integer> <list>
> > 1 1 1,2
> > 2 2 3,4
> >
> > ace (and not attached):
> > [1]
> > zlibbioc_1.23.0 compiler_3.4.1 XVector_0.17.1
> > [4] tools_3.4.1 GenomeInfoDbData_0.99.1
> > RCurl_1.95-
> > 4.8
> > [7] ulimit_0.0-3 bitops_1.0-6
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_m
> > ailman_listinfo_bioc-
> > 2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYb
> > W0WYiZvSXAJJKaaPhzWA&m=npFXtfKAjVRDigSzntYatjIYfWrBU30MFNqbP6u8Njg&
> > s=J5tukPZSuK7728ZillLQJHHrfu7e0o1QsLm0OPNiS2Y&e=
> >
>
>
More information about the Bioc-devel
mailing list