[Bioc-devel] Printing DataFrame with nested data.frame/DataFrame/DataFrameList
Hervé Pagès
hpages at fredhutch.org
Thu Sep 28 22:47:13 CEST 2017
Hi Jialin,
Thanks for the excellent report. These "show" methods like
many others in Bioconductor, rely on low-level helper showAsCell()
which was not working properly on data-frame-like or array-like
objects with a single column, or on SplitDataFrameList objects.
This should now be addressed. The fix is in S4Vectors 0.14.5
(release) and 0.15.10 (devel). Both should become available
via biocLite() in about 24 hours.
Let us know if you still see "show" problems after you update.
Thanks,
H.
On 09/28/2017 01:19 AM, Jialin Ma wrote:
> Dear all,
>
> I have a package in reviewing at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_Contributions_issues_487&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=npFXtfKAjVRDigSzntYatjIYfWrBU30MFNqbP6u8Njg&s=P6CWpnkqCx0GPBTlw7QD2gGs_Lc3c063in1J_F4vvDY&e=, in which I
> would like to use a GRanges with nested data.frame or DataFrameList to
> represent the track data internally.
>
> However, the default show method does not seem to work well with such
> structures.
>
> I have an example for GRanges in which one meta-column is a one-column
> data frame:
>
> gr <- GRanges("chr21", IRanges(1:5, width = 1))
> gr$df <- data.frame(x = 1:5)
> show(gr)
>
> GRanges object with 5 ranges and 1 metadata column:
> Error in .Method(..., deparse.level = deparse.level) :
> number of rows of matrices must match (see arg 3)
>
> However, if the nested data frame has two columns, it can be printed
> out correctly:
>
> gr <- GRanges("chr21", IRanges(1:5, width = 1))
> gr$df <- data.frame(x = 1:5, y = 11:15)
> show(gr)
>
> GRanges object with 5 ranges and 1 metadata column:
> seqnames ranges strand | df
> <Rle> <IRanges> <Rle> | <data.frame>
> [1] chr21 [1, 1] * | 1:11
> [2] chr21 [2, 2] * | 2:12
> [3] chr21 [3, 3] * | 3:13
> [4] chr21 [4, 4] * | 4:14
> [5] chr21 [5, 5] * | 5:15
> -------
> seqinfo: 1 sequence from an unspecified genome; no seqlengths
>
> In some cases, it can be printed with a warning message, but the form
> is wrong:
>
> gr <- GRanges("chr21", IRanges(1:5, width = 1), emm = 6:10)
> gr$df <- data.frame(x = 1:5)
> show(gr)
>
> # The nested df is not printed with correct format, there is only
> # one column in the nested df.
>
> GRanges object with 5 ranges and 2 metadata columns:
> seqnames ranges strand | emm df
> <Rle> <IRanges> <Rle> | <integer> <data.frame>
> [1] chr21 [1, 1] * | 6 1,2,3,...
> [2] chr21 [2, 2] * | 7 1,2,3,...
> [3] chr21 [3, 3] * | 8 1,2,3,...
> [4] chr21 [4, 4] * | 9 1,2,3,...
> [5] chr21 [5, 5] * | 10 1,2,3,...
> -------
> seqinfo: 1 sequence from an unspecified genome; no seqlengths
> Warning message:
> In (function (..., row.names = NULL, check.rows = FALSE, check.names
> = TRUE, :
> row names were found from a short variable and have been discarded
>
> Nested DataFrameList can not be printed:
>
> DF <- DataFrame(x = 1:2)
> DF$split = split(DataFrame(aa = 1:4), c(1,1,2,2))
> show(DF)
>
> DataFrame with 2 rows and 2 columns
> Error in dim(object) <- c(nrow(object), prod(tail(dim(object), -1)))
> :
> invalid first argument
>
> class(DF$split)
>
> [1] "CompressedSplitDataFrameList"
> attr(,"package")
> [1] "IRanges"
>
> In the case above, I understand that it is hard to create a short
> string representation of the nested structure, but I think printing
> dimensions of the nested element may be sufficient.
>
> Any comments?
>
> Best,
> Jialin
>
> -----------
> Session Info:
>
> R version 3.4.1 (2017-06-30)
> Platform: x86_64-suse-linux-gnu (64-bit)
> Running under: openSUSE Tumbleweed
>
> Matrix products: default
> BLAS: /usr/lib64/R/lib/libRblas.so
> LAPACK: /usr/lib64/R/lib/libRlapack.so
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats4 parallel stats graphics grDevices
> utils datasets
> [8] methods base
>
> other attached packages:
> [1] Biobase_2.37.2 GenomicRanges_1.29.14 GenomeInfoDb_1.13.4
> [4] IRanges_2.11.17 S4Vectors_0.15.8 BiocGenerics_0.23.1
> [7] magrittr_1.5
>
> loaded via a namesp
>
> r$> DF$split <- DF$split %>% as.list %>%
> lapply(as.data.frame)
>
> r$>
> DF
>
> DataFrame with 2 rows and 2 columns
> x split
> <integer> <list>
> 1 1 1,2
> 2 2 3,4
>
> ace (and not attached):
> [1]
> zlibbioc_1.23.0 compiler_3.4.1 XVector_0.17.1
> [4] tools_3.4.1 GenomeInfoDbData_0.99.1 RCurl_1.95-
> 4.8
> [7] ulimit_0.0-3 bitops_1.0-6
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=npFXtfKAjVRDigSzntYatjIYfWrBU30MFNqbP6u8Njg&s=J5tukPZSuK7728ZillLQJHHrfu7e0o1QsLm0OPNiS2Y&e=
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list