[BioC] second GenomicRanges request - enable query-self comparison (and so ignoreSelf and ignoreRedundant) for GRangesList objects?

Valerie Obenchain vobencha at fhcrc.org
Sun Jun 15 21:35:03 CEST 2014


Great. Thanks.

Val

On 06/14/2014 10:27 PM, Michael Lawrence wrote:
> A GRangesList is a Vector, so that method should work. It just neglected
> to resolve the type argument with match.arg() before forwarding. That's
> now fixed.
>
> Michael
>
>
> On Sat, Jun 14, 2014 at 6:27 PM, Valerie Obenchain <vobencha at fhcrc.org
> <mailto:vobencha at fhcrc.org>> wrote:
>
>     Hi Janet, Michael,
>
>     Janet, were you able to get what you needed by specifying 'type'?
>
>     Michael, which method are you referring to when you say it exists
>     but is broken? The only method I see that allows a missing 'subject'
>     is query="Vector". Maybe we need to add query="GRangesList"
>     subject="missing"?
>
>
>     Val
>
>
>
>     On 06/11/2014 11:15 AM, Michael Lawrence wrote:
>
>         Turns out I added something else. In this case, the method does
>         exist, but
>         it's just broken. You might have better luck if you specify the type
>         argument to findOverlaps, like type="any".
>
>
>         On Wed, Jun 11, 2014 at 10:52 AM, Janet Young <jayoung at fhcrc.org
>         <mailto:jayoung at fhcrc.org>> wrote:
>
>             I'm glad that's on your radar already - nice.  I updated all
>             devel
>             packages this morning - it's not there yet.  Here's my new
>             sessionInfo():
>
>                 sessionInfo()
>
>             R version 3.1.0 Patched (2014-05-26 r65771)
>             Platform: x86_64-unknown-linux-gnu (64-bit)
>
>             locale:
>                [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>                [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>                [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>                [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>                [9] LC_ADDRESS=C               LC_TELEPHONE=C
>             [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>             attached base packages:
>             [1] parallel  stats     graphics  grDevices utils
>             datasets  methods
>             [8] base
>
>             other attached packages:
>             [1] rtracklayer_1.25.11   GenomicRanges_1.17.18
>             GenomeInfoDb_1.1.7
>             [4] IRanges_1.99.15       S4Vectors_0.0.8
>             BiocGenerics_0.11.2
>
>             loaded via a namespace (and not attached):
>                [1] BatchJobs_1.2            BBmisc_1.6
>             BiocParallel_0.7.2
>
>                [4] Biostrings_2.33.10       bitops_1.0-6
>             brew_1.0-6
>
>                [7] codetools_0.2-8          DBI_0.2-7
>               digest_0.6.4
>
>             [10] fail_1.2                 foreach_1.4.2
>             GenomicAlignments_1.1.14
>             [13] iterators_1.0.7          plyr_1.8.1
>             Rcpp_0.11.2
>
>             [16] RCurl_1.95-4.1           Rsamtools_1.17.25
>               RSQLite_0.11.4
>
>             [19] sendmailR_1.1-2          stats4_3.1.0
>             stringr_0.6.2
>
>             [22] tools_3.1.0              XML_3.98-1.1
>             XVector_0.5.6
>
>             [25] zlibbioc_1.11.1
>
>             On Jun 10, 2014, at 3:31 PM, Michael Lawrence
>             <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
>             wrote:
>
>             I think I might have already added this a couple days ago to
>             devel...
>             tough to keep it all straight in my head
>
>
>             On Tue, Jun 10, 2014 at 3:14 PM, Janet Young
>             <jayoung at fhcrc.org <mailto:jayoung at fhcrc.org>> wrote:
>
>                 Hi again,
>
>                 This is somewhat related to another request I sent
>                 earlier this
>                 afternoon. Is it possible to implement query-self
>                 comparisons (i.e. where I
>                 specify query but not subject) for GRangesList objects?
>                    The motivation is
>                 that I'd like to use the ignoreSelf and ignoreRedundant
>                 options in a
>                 GRangesList comparison.
>
>                 I mentioned in my other request that I'm looking through
>                 a set of genes
>                 to find pairs that overlap on opposite strands.  I'm now
>                 using findOverlaps
>                 on a GRangesList object instead of a GRanges object -
>                 that's because I want
>                 to only look at cases where parts of the final spliced
>                 transcript overlap,
>                 not cases where a large intron-containing gene has
>                 another smaller gene
>                 nested in an intron.  When I used findOverlaps on the
>                 entire genes at once
>                 as GRanges objects, my results included pairs of that
>                 nested type, but if I
>                 use the blocks function to get just the exons as a
>                 GRangesList object, that
>                 lets me successfully ignore the nested gene case, which
>                 is great.  However,
>                 with GRangesList I can't use the query-self comparison
>                 and therefore can't
>                 access those useful ignoreSelf and ignoreRedundant
>                 options.  I know I can
>                 workaround that too with some effort, but it'd be great
>                 to have it as part
>                 of the underlying code.
>
>                 Again, I've included some code below that should show
>                 what I'm trying to
>                 do.
>
>                 all the best,
>
>                 Janet
>
>
>                 library(rtracklayer)
>
>                 #### get some drosophila genes as a test case:
>                 mySession <- browserSession()
>                 genome(mySession) <- "dm3"
>                 genes <- ucscTableQuery (mySession, track="flyBaseGene",
>                 table="flyBaseGene")
>                 genes <- track(genes)
>
>                 #### reduce to a smaller example that contains a gene
>                 pair of the type
>                 I'm talking about (CG33797-RA is nested inside CG11152-RA)
>                 genes <- genes[148:152]
>
>                 #### remove strand info, as a workaround to not being
>                 able to specify
>                 ignore.strand for a query-self findOverlaps call
>                 strand(genes) <- "*"
>
>                 #### using findOverlaps on the genes themselves shows me
>                 the nested pair
>                 (query=3, subject=4)
>                 findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant
>                 = TRUE)
>
>                 #### so I'll use blocks to extract only the exonic
>                 portions of the genes
>                 as a GRangesList:
>                 genes_GRL <- blocks(genes)
>
>                 #### and use findOverlaps on that GRangesList object,
>                 first by specifying
>                 it as both query and subject in the comparison - this
>                 gives me more or less
>                 what I want (i.e. it does NOT show the nested pair 3-4),
>                 except that
>                 there's a bunch of filtering to do later.
>                 findOverlaps( genes_GRL, genes_GRL)
>
>                 #### ideally, to help me filter the hits I'd like to be
>                 able to use
>                 ignoreSelf and ignoreRedundant, but I can only use those
>                 if it's a
>                 query-self comparison (i.e. only works if no subject is
>                 specified)
>                 findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE,
>                 ignoreRedundant=TRUE)
>                 # Error in .local(query, subject, maxgap, minoverlap,
>                 type, select, ...) :
>                 #   unused arguments (ignoreSelf = TRUE, ignoreRedundant
>                 = TRUE)
>
>                 #### and it looks like findOverlaps is not implemented
>                 for the query-self
>                 case for GRangesList objects
>                 findOverlaps( genes_GRL)
>                 # Error in match.arg(type) : 'arg' must be of length 1
>
>                 sessionInfo()
>
>                 R version 3.1.0 Patched (2014-05-26 r65771)
>                 Platform: x86_64-unknown-linux-gnu (64-bit)
>
>                 locale:
>                    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>                    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>                    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>                    [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>                    [9] LC_ADDRESS=C               LC_TELEPHONE=C
>                 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>                 attached base packages:
>                 [1] parallel  stats     graphics  grDevices utils
>                 datasets  methods
>                 [8] base
>
>                 other attached packages:
>                 [1] rtracklayer_1.25.11   GenomicRanges_1.17.17
>                 GenomeInfoDb_1.1.6
>                 [4] IRanges_1.99.15       S4Vectors_0.0.8
>                 BiocGenerics_0.11.2
>
>                 loaded via a namespace (and not attached):
>                    [1] BatchJobs_1.2            BBmisc_1.6
>                 BiocParallel_0.7.2
>                    [4] Biostrings_2.33.10       bitops_1.0-6
>                 brew_1.0-6
>                    [7] codetools_0.2-8          DBI_0.2-7
>                   digest_0.6.4
>                 [10] fail_1.2                 foreach_1.4.2
>                    GenomicAlignments_1.1.13
>                 [13] iterators_1.0.7          plyr_1.8.1
>                 Rcpp_0.11.2
>                 [16] RCurl_1.95-4.1           Rsamtools_1.17.23
>                   RSQLite_0.11.4
>                 [19] sendmailR_1.1-2          stats4_3.1.0
>                 stringr_0.6.2
>                 [22] tools_3.1.0              XML_3.98-1.1
>                 XVector_0.5.6
>                 [25] zlibbioc_1.11.1
>
>                 _________________________________________________
>                 Bioconductor mailing list
>                 Bioconductor at r-project.org
>                 <mailto:Bioconductor at r-project.org>
>                 https://stat.ethz.ch/mailman/__listinfo/bioconductor
>                 <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>                 Search the archives:
>                 http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>                 <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
>
>
>                  [[alternative HTML version deleted]]
>
>
>         _________________________________________________
>         Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         https://stat.ethz.ch/mailman/__listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>         Search the archives:
>         http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>



More information about the Bioconductor mailing list