[Bioc-devel] as.data.frame for GRanges when one meta column is a data frame
Hervé Pagès
hp@ge@ @ending from fredhutch@org
Thu Jul 5 19:59:30 CEST 2018
Hi Jialin,
Note that up to BioC 3.7, as.data.frame(gr) in your example
was returning a broken data.frame:
> as.data.frame(gr)
Error in dim(rvec) <- dim(x) :
dims [product 6] do not match the length of object [1]
More precisely, the call to as.data.frame(gr) is successful and
returns a data.frame but that data.frame cannot be displayed:
> df2 <- as.data.frame(gr)
> df2
Error in dim(rvec) <- dim(x) :
dims [product 6] do not match the length of object [1]
The problem is with the print.data.frame() method:
> print.data.frame(df2)
Error in dim(rvec) <- dim(x) :
dims [product 6] do not match the length of object [1]
Feel free to bring this up to the R devel folks.
Anyway, since it's not clear whether data.frame objects are actually
expected to support nesting, it's safer to have as.data.frame()
getting rid of the nesting.
Furthermore: as.data.frame() **has** to "un-nest" nested objects
in the general case e.g. when the nested objects are S4
vector-like objects like Hits, GRanges, DataFrame, etc... That's
because an ordinary data.frame cannot contain these objects. So it
seems preferable to un-nest everything rather than making an exception
when the metadata column is a data.frame. In particular, this exception
would lead to inconsistent behavior if the data.frame column is replaced
with a DataFrame.
For the record, here is the commit that refactored as.data.frame()
to un-nest everything:
https://github.com/Bioconductor/S4Vectors/commit/d84bc18dea7a232061946fbfe30d2072b88705a7
With this new approach, as.data.frame() can work on "complicated"
objects i.e. on objects with an arbitrary number of nesting levels.
Hope this makes sense.
Cheers,
H.
On 07/04/2018 01:38 PM, Jialin Ma wrote:
> Dear all,
>
> It seems that the devel branch of Bioconductor has made
> changes/improvements on the behavior of as.data.frame. In the case that
> input is a GRanges with a meta column of data frame, as.data.frame in
> devel will flatten the nested data frame. I made an example below:
>
> library(GenomicRanges)
> gr <- GRanges("chr2", IRanges(1:6, width = 2))
> gr$df <- data.frame(x = runif(6))
> str(as.data.frame(gr))
>
> which shows:
>
> 'data.frame': 6 obs. of 6 variables:
> $ seqnames: Factor w/ 1 level "chr2": 1 1 1 1 1 1
> $ start : int 1 2 3 4 5 6
> $ end : int 2 3 4 5 6 7
> $ width : int 2 2 2 2 2 2
> $ strand : Factor w/ 3 levels "+","-","*": 3 3 3 3 3 3
> $ x : num 0.55 0.058 0.966 0.75 0.764 ...
>
> with session info:
>
> R version 3.5.0 (2018-04-23)
> Platform: x86_64-suse-linux-gnu (64-bit)
> Running under: openSUSE Tumbleweed
>
> attached base packages:
> [1] parallel stats4 stats graphics grDevices
> utils datasets
> [8] methods base
>
> other attached packages:
> [1] GenomicRanges_1.33.6 GenomeInfoDb_1.17.1 IRanges_2.15.14
> [4] S4Vectors_0.19.17 BiocGenerics_0.27.1 magrittr_1.5
>
> loaded via a namespace (and not attached):
> [1]
> zlibbioc_1.27.0 compiler_3.5.0 XVector_0.21.3
> [4] tools_3.5.0 GenomeInfoDbData_1.1.0 RCurl_1.95-
> 4.10
> [7] yaml_2.1.19 bitops_1.0-6
>
>
> While in the old version, the same code have the following results:
>
> 'data.frame': 6 obs. of 6 variables:
> $ seqnames: Factor w/ 1 level "chr2": 1 1 1 1 1 1
> $ start : int 1 2 3 4 5 6
> $ end : int 2 3 4 5 6 7
> $ width : int 2 2 2 2 2 2
> $ strand : Factor w/ 3 levels "+","-","*": 3 3 3 3 3 3
> $ df :Classes ‘AsIs’ and 'data.frame': 6 obs. of 1
> variable:
> ..$ x: num 0.935 0.577 0.245 0.687 0.194 ...
>
> with session info:
>
> R version 3.5.0 (2018-04-23)
> Platform: x86_64-suse-linux-gnu (64-bit)
> Running under: openSUSE Tumbleweed
>
> attached base packages:
> [1] parallel stats4 stats graphics grDevices
> utils datasets
> [8] methods base
>
> other attached packages:
> [1] GenomicRanges_1.32.3 GenomeInfoDb_1.17.1 IRanges_2.14.10
> [4] S4Vectors_0.18.3 BiocGenerics_0.27.1 magrittr_1.5
>
> loaded via a namespace (and not attached):
> [1]
> zlibbioc_1.27.0 compiler_3.5.0 BiocInstaller_1.30.0
> [4]
> XVector_0.21.3 tools_3.5.0 GenomeInfoDbData_1.1.0
> [7] RCurl_1.95-4.10 yaml_2.1.19 bitops_1.0-
> 6
>
>
> I personally feel that automatically flattening the nested data frame
> may not be the right behavior. I am not sure about it but I would like
> to suggest to keep data frame column as is when using as.data.frame
> (also do not add "AsIs" class as it will cause error showing the
> converted data frame).
>
> Any thoughts?
>
> Best regards,
> Jialin
>
>
>
> -------- Forwarded Message --------
> From: "Shepherd, Lori" <Lori.Shepherd using RoswellPark.org>
> To: marlin- using gmx.cn <marlin- using gmx.cn>
> Subject: failing Bioconductor package TnT
> Date: Tue, 3 Jul 2018 12:25:20 +0000
>
>> Dear TnT maintainer,
>>
>> I'd like to bring to your attention that the TnT package is failing
>> to pass 'R CMD build' on all platforms in the devel version of
>> Bioconductor (i.e. BioC 3.8):
>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__bioconductor.org_checkResults_devel_bioc-2DLATEST_TnT&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Qj-vl9xxsXyBySh08ExrawvLKqjD6wsNm-Ksdv_FY5M&s=F8bgEUvup-gEFW5bhS2Qwar6e7mcBHB5RJ7bpO320-g&e=
>>
>> Would you mind taking a look at this? Don't hesitate to ask on the bi
>> oc-devel using r-project.org mailing list if you have any question or need
>> help.
>>
>>
>> While devel is a place to experiment with new features, we expect
>> packages to build and check cleanly in a reasonable time period and
>> not stay broken for
>> any extended period of time. The package has been failing since
>> 06/11/18
>>
>> If no action is taken over the next few weeks we will begin the
>> deprecation process for your package.
>>
>>
>> Thank you for your time and effort, and your continued contribution
>> to Bioconductor.
>>
>> Pleae be advised that Bioconductor has switched from svn to Git. Some
>> helpful links can be found here:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__bioconductor.org_developers_how-2Dto_git_&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Qj-vl9xxsXyBySh08ExrawvLKqjD6wsNm-Ksdv_FY5M&s=sTHnSumyDr9UrxEynYbE2X_wTeyelJEgKiJ5qCh5_y8&e=
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__bioconductor.org_developers_how-2Dto_git_bug-2Dfix-2Din-2Drelease-2Dand-2D&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Qj-vl9xxsXyBySh08ExrawvLKqjD6wsNm-Ksdv_FY5M&s=005acfxYDLwSkfUPRJ14v0UbzU6yeYb_6s0TrIgT50k&e=
>> devel/
>>
>>
>>
>> Lori Shepherd
>> Bioconductor Core Team
>> Roswell Park Cancer Institute
>> Department of Biostatistics & Bioinformatics
>> Elm & Carlton Streets
>> Buffalo, New York 14263
>>
>> This email message may contain legally privileged and/or confidential
>> information. If you are not the intended recipient(s), or the
>> employee or agent responsible for the delivery of this message to the
>> intended recipient(s), you are hereby notified that any disclosure,
>> copying, distribution, or use of this email message is prohibited. If
>> you have received this message in error, please notify the sender
>> immediately by e-mail and delete this email message from your
>> computer. Thank you.
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Qj-vl9xxsXyBySh08ExrawvLKqjD6wsNm-Ksdv_FY5M&s=t4B7seeMvFDydrqlCa5XQLvfjxhjSke-NHGWjS30Lkc&e=
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages using fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list