[Bioc-devel] Printing DataFrame with nested data.frame/DataFrame/DataFrameList

Jialin Ma marlin- at gmx.cn
Thu Sep 28 23:19:55 CEST 2017


Hi Hervé,

Thanks for addressing it so quickly, I will check it when the new
version if available for biocLite().

Thanks!
Jialin

On Thu, 2017-09-28 at 13:47 -0700, Hervé Pagès wrote:
> Hi Jialin,
> 
> Thanks for the excellent report. These "show" methods like
> many others in Bioconductor, rely on low-level helper showAsCell()
> which was not working properly on data-frame-like or array-like
> objects with a single column, or on SplitDataFrameList objects.
> 
> This should now be addressed. The fix is in S4Vectors 0.14.5
> (release) and 0.15.10 (devel). Both should become available
> via biocLite() in about 24 hours.
> 
> Let us know if you still see "show" problems after you update.
> 
> Thanks,
> H.
> 
> On 09/28/2017 01:19 AM, Jialin Ma wrote:
> > Dear all,
> > 
> > I have a package in reviewing at
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bio
> > conductor_Contributions_issues_487&d=DwICAg&c=eRAMFD45gAfqt84VtBcfh
> > Q&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=npFXtfKAjVRDigSzn
> > tYatjIYfWrBU30MFNqbP6u8Njg&s=P6CWpnkqCx0GPBTlw7QD2gGs_Lc3c063in1J_F
> > 4vvDY&e=, in which I
> > would like to use a GRanges with nested data.frame or DataFrameList
> > to
> > represent the track data internally.
> > 
> > However, the default show method does not seem to work well with
> > such
> > structures.
> > 
> > I have an example for GRanges in which one meta-column is a one-
> > column
> > data frame:
> > 
> >      gr <- GRanges("chr21", IRanges(1:5, width = 1))
> >      gr$df <- data.frame(x = 1:5)
> >      show(gr)
> > 
> >      GRanges object with 5 ranges and 1 metadata column:
> >      Error in .Method(..., deparse.level = deparse.level) :
> >        number of rows of matrices must match (see arg 3)
> > 
> > However, if the nested data frame has two columns, it can be
> > printed
> > out correctly:
> > 
> >      gr <- GRanges("chr21", IRanges(1:5, width = 1))
> >      gr$df <- data.frame(x = 1:5, y = 11:15)
> >      show(gr)
> > 
> >      GRanges object with 5 ranges and 1 metadata column:
> >            seqnames    ranges strand |           df
> >               <Rle> <IRanges>  <Rle> | <data.frame>
> >        [1]    chr21    [1, 1]      * |         1:11
> >        [2]    chr21    [2, 2]      * |         2:12
> >        [3]    chr21    [3, 3]      * |         3:13
> >        [4]    chr21    [4, 4]      * |         4:14
> >        [5]    chr21    [5, 5]      * |         5:15
> >        -------
> >        seqinfo: 1 sequence from an unspecified genome; no
> > seqlengths
> > 
> > In some cases, it can be printed with a warning message, but the
> > form
> > is wrong:
> > 
> >      gr <- GRanges("chr21", IRanges(1:5, width = 1), emm = 6:10)
> >      gr$df <- data.frame(x = 1:5)
> >      show(gr)
> > 
> >      # The nested df is not printed with correct format, there is
> > only
> >      # one column in the nested df.
> > 
> >      GRanges object with 5 ranges and 2 metadata columns:
> >            seqnames    ranges strand |       emm           df
> >               <Rle> <IRanges>  <Rle> | <integer> <data.frame>
> >        [1]    chr21    [1, 1]      * |         6    1,2,3,...
> >        [2]    chr21    [2, 2]      * |         7    1,2,3,...
> >        [3]    chr21    [3, 3]      * |         8    1,2,3,...
> >        [4]    chr21    [4, 4]      * |         9    1,2,3,...
> >        [5]    chr21    [5, 5]      * |        10    1,2,3,...
> >        -------
> >        seqinfo: 1 sequence from an unspecified genome; no
> > seqlengths
> >      Warning message:
> >      In (function (..., row.names = NULL, check.rows = FALSE,
> > check.names
> >      = TRUE,  :
> >        row names were found from a short variable and have been
> > discarded
> > 
> > Nested DataFrameList can not be printed:
> > 
> >      DF <- DataFrame(x = 1:2)
> >      DF$split = split(DataFrame(aa = 1:4), c(1,1,2,2))
> >      show(DF)
> > 
> >      DataFrame with 2 rows and 2 columns
> >      Error in dim(object) <- c(nrow(object), prod(tail(dim(object),
> > -1)))
> >      :
> >        invalid first argument
> > 
> >      class(DF$split)
> > 
> >      [1] "CompressedSplitDataFrameList"
> >      attr(,"package")
> >      [1] "IRanges"
> > 
> >      In the case above, I understand that it is hard to create a
> > short
> >      string representation of the nested structure, but I think
> > printing
> >      dimensions of the nested element may be sufficient.
> > 
> >      Any comments?
> > 
> >      Best,
> >      Jialin
> > 
> >      -----------
> >      Session Info:
> > 
> >      R version 3.4.1 (2017-06-30)
> >      Platform: x86_64-suse-linux-gnu (64-bit)
> >      Running under: openSUSE Tumbleweed
> > 
> >      Matrix products: default
> >      BLAS: /usr/lib64/R/lib/libRblas.so
> >      LAPACK: /usr/lib64/R/lib/libRlapack.so
> > 
> >      locale:
> >       [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> >       [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> >       [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> >       [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> >       [9] LC_ADDRESS=C               LC_TELEPHONE=C
> >      [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> > 
> >      attached base packages:
> >      [1] stats4    parallel  stats     graphics  grDevices
> >      utils     datasets
> >      [8] methods   base
> > 
> >      other attached packages:
> >      [1] Biobase_2.37.2        GenomicRanges_1.29.14
> > GenomeInfoDb_1.13.4
> >      [4]
> > IRanges_2.11.17       S4Vectors_0.15.8      BiocGenerics_0.23.1
> >      [7] magrittr_1.5
> > 
> >      loaded via a namesp
> > 
> >      r$> DF$split <- DF$split %>% as.list %>%
> >      lapply(as.data.frame)
> > 
> >      r$>
> >      DF
> > 
> >      DataFrame with 2 rows and 2 columns
> >                x  split
> >        <integer> <list>
> >      1         1    1,2
> >      2         2    3,4
> > 
> >      ace (and not attached):
> >      [1]
> >      zlibbioc_1.23.0         compiler_3.4.1          XVector_0.17.1
> >      [4] tools_3.4.1             GenomeInfoDbData_0.99.1
> > RCurl_1.95-
> >      4.8
> >      [7] ulimit_0.0-3            bitops_1.0-6
> > 
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_m
> > ailman_listinfo_bioc-
> > 2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYb
> > W0WYiZvSXAJJKaaPhzWA&m=npFXtfKAjVRDigSzntYatjIYfWrBU30MFNqbP6u8Njg&
> > s=J5tukPZSuK7728ZillLQJHHrfu7e0o1QsLm0OPNiS2Y&e=
> > 
> 
> 



More information about the Bioc-devel mailing list