[Bioc-devel] as.data.frame for GRanges when one meta column is a data frame

Hervé Pagès hp@ge@ @ending from fredhutch@org
Thu Jul 5 19:59:30 CEST 2018


Hi Jialin,

Note that up to BioC 3.7, as.data.frame(gr) in your example
was returning a broken data.frame:

   > as.data.frame(gr)
   Error in dim(rvec) <- dim(x) :
     dims [product 6] do not match the length of object [1]

More precisely, the call to as.data.frame(gr) is successful and
returns a data.frame but that data.frame cannot be displayed:

   > df2 <- as.data.frame(gr)
   > df2
   Error in dim(rvec) <- dim(x) :
     dims [product 6] do not match the length of object [1]

The problem is with the print.data.frame() method:

   > print.data.frame(df2)
   Error in dim(rvec) <- dim(x) :
     dims [product 6] do not match the length of object [1]

Feel free to bring this up to the R devel folks.

Anyway, since it's not clear whether data.frame objects are actually
expected to support nesting, it's safer to have as.data.frame()
getting rid of the nesting.

Furthermore: as.data.frame() **has** to "un-nest" nested objects
in the general case e.g. when the nested objects are S4
vector-like objects like Hits, GRanges, DataFrame, etc... That's
because an ordinary data.frame cannot contain these objects. So it
seems preferable to un-nest everything rather than making an exception
when the metadata column is a data.frame. In particular, this exception
would lead to inconsistent behavior if the data.frame column is replaced
with a DataFrame.

For the record, here is the commit that refactored as.data.frame()
to un-nest everything:

 
https://github.com/Bioconductor/S4Vectors/commit/d84bc18dea7a232061946fbfe30d2072b88705a7

With this new approach, as.data.frame() can work on "complicated"
objects i.e. on objects with an arbitrary number of nesting levels.

Hope this makes sense.

Cheers,
H.


On 07/04/2018 01:38 PM, Jialin Ma wrote:
> Dear all,
> 
> It seems that the devel branch of Bioconductor has made
> changes/improvements on the behavior of as.data.frame. In the case that
> input is a GRanges with a meta column of data frame, as.data.frame in
> devel will flatten the nested data frame. I made an example below:
> 
>   library(GenomicRanges)
>   gr <- GRanges("chr2", IRanges(1:6, width = 2))
>   gr$df <- data.frame(x = runif(6))
>   str(as.data.frame(gr))
> 
> which shows:
> 
>    'data.frame':	6 obs. of  6 variables:
>    $ seqnames: Factor w/ 1 level "chr2": 1 1 1 1 1 1
>    $ start   : int  1 2 3 4 5 6
>    $ end     : int  2 3 4 5 6 7
>    $ width   : int  2 2 2 2 2 2
>    $ strand  : Factor w/ 3 levels "+","-","*": 3 3 3 3 3 3
>    $ x       : num  0.55 0.058 0.966 0.75 0.764 ...
> 
> with session info:
> 
>    R version 3.5.0 (2018-04-23)
>    Platform: x86_64-suse-linux-gnu (64-bit)
>    Running under: openSUSE Tumbleweed
> 
>    attached base packages:
>    [1] parallel  stats4    stats     graphics  grDevices
> utils     datasets
>    [8] methods   base
>    
>    other attached packages:
>    [1] GenomicRanges_1.33.6 GenomeInfoDb_1.17.1  IRanges_2.15.14
>    [4] S4Vectors_0.19.17    BiocGenerics_0.27.1  magrittr_1.5
>    
>    loaded via a namespace (and not attached):
>    [1]
> zlibbioc_1.27.0        compiler_3.5.0         XVector_0.21.3
>    [4] tools_3.5.0            GenomeInfoDbData_1.1.0 RCurl_1.95-
> 4.10
>    [7] yaml_2.1.19            bitops_1.0-6
>    
> 
> While in the old version, the same code have the following results:
> 
>    'data.frame':	6 obs. of  6 variables:
>    $ seqnames: Factor w/ 1 level "chr2": 1 1 1 1 1 1
>    $ start   : int  1 2 3 4 5 6
>    $ end     : int  2 3 4 5 6 7
>    $ width   : int  2 2 2 2 2 2
>    $ strand  : Factor w/ 3 levels "+","-","*": 3 3 3 3 3 3
>    $ df      :Classes ‘AsIs’ and 'data.frame':	6 obs. of  1
> variable:
>      ..$ x: num  0.935 0.577 0.245 0.687 0.194 ...
> 
> with session info:
> 
>    R version 3.5.0 (2018-04-23)
>    Platform: x86_64-suse-linux-gnu (64-bit)
>    Running under: openSUSE Tumbleweed
>    
>    attached base packages:
>    [1] parallel  stats4    stats     graphics  grDevices
> utils     datasets
>    [8] methods   base
>    
>    other attached packages:
>    [1] GenomicRanges_1.32.3 GenomeInfoDb_1.17.1  IRanges_2.14.10
>    [4] S4Vectors_0.18.3     BiocGenerics_0.27.1  magrittr_1.5
>    
>    loaded via a namespace (and not attached):
>    [1]
> zlibbioc_1.27.0        compiler_3.5.0         BiocInstaller_1.30.0
>    [4]
> XVector_0.21.3         tools_3.5.0            GenomeInfoDbData_1.1.0
>    [7] RCurl_1.95-4.10        yaml_2.1.19            bitops_1.0-
> 6
>    
> 
> I personally feel that automatically flattening the nested data frame
> may not be the right behavior. I am not sure about it but I would like
> to suggest to keep data frame column as is when using as.data.frame
> (also do not add "AsIs" class as it will cause error showing the
> converted data frame).
> 
> Any thoughts?
> 
> Best regards,
> Jialin
> 
> 
> 
> -------- Forwarded Message --------
> From: "Shepherd, Lori" <Lori.Shepherd using RoswellPark.org>
> To: marlin- using gmx.cn <marlin- using gmx.cn>
> Subject: failing Bioconductor package TnT
> Date: Tue, 3 Jul 2018 12:25:20 +0000
> 
>> Dear TnT maintainer,
>>
>> I'd like to bring to your attention that the TnT package is failing
>> to pass 'R CMD build' on all platforms in the devel version of
>> Bioconductor (i.e. BioC 3.8):
>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__bioconductor.org_checkResults_devel_bioc-2DLATEST_TnT&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Qj-vl9xxsXyBySh08ExrawvLKqjD6wsNm-Ksdv_FY5M&s=F8bgEUvup-gEFW5bhS2Qwar6e7mcBHB5RJ7bpO320-g&e=
>>
>> Would you mind taking a look at this? Don't hesitate to ask on the bi
>> oc-devel using r-project.org mailing list if you have any question or need
>> help.
>>
>>
>> While devel is a place to experiment with new features, we expect
>> packages to build and check cleanly in a reasonable time period and
>> not stay broken for
>> any extended period of time.   The package has been failing since
>> 06/11/18
>>
>> If no action is taken over the next few weeks we will begin the
>> deprecation process for your package.
>>
>>
>> Thank you for your time and effort, and your continued contribution
>> to Bioconductor.
>>
>> Pleae be advised that Bioconductor has switched from svn to Git. Some
>> helpful links can be found here:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__bioconductor.org_developers_how-2Dto_git_&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Qj-vl9xxsXyBySh08ExrawvLKqjD6wsNm-Ksdv_FY5M&s=sTHnSumyDr9UrxEynYbE2X_wTeyelJEgKiJ5qCh5_y8&e=
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__bioconductor.org_developers_how-2Dto_git_bug-2Dfix-2Din-2Drelease-2Dand-2D&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Qj-vl9xxsXyBySh08ExrawvLKqjD6wsNm-Ksdv_FY5M&s=005acfxYDLwSkfUPRJ14v0UbzU6yeYb_6s0TrIgT50k&e=
>> devel/
>>
>>
>>
>> Lori Shepherd
>> Bioconductor Core Team
>> Roswell Park Cancer Institute
>> Department of Biostatistics & Bioinformatics
>> Elm & Carlton Streets
>> Buffalo, New York 14263
>>
>> This email message may contain legally privileged and/or confidential
>> information. If you are not the intended recipient(s), or the
>> employee or agent responsible for the delivery of this message to the
>> intended recipient(s), you are hereby notified that any disclosure,
>> copying, distribution, or use of this email message is prohibited. If
>> you have received this message in error, please notify the sender
>> immediately by e-mail and delete this email message from your
>> computer. Thank you.
> 
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Qj-vl9xxsXyBySh08ExrawvLKqjD6wsNm-Ksdv_FY5M&s=t4B7seeMvFDydrqlCa5XQLvfjxhjSke-NHGWjS30Lkc&e=
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list