[BioC] second GenomicRanges request - enable query-self comparison (and so ignoreSelf and ignoreRedundant) for GRangesList objects?
Valerie Obenchain
vobencha at fhcrc.org
Sun Jun 15 21:35:03 CEST 2014
Great. Thanks.
Val
On 06/14/2014 10:27 PM, Michael Lawrence wrote:
> A GRangesList is a Vector, so that method should work. It just neglected
> to resolve the type argument with match.arg() before forwarding. That's
> now fixed.
>
> Michael
>
>
> On Sat, Jun 14, 2014 at 6:27 PM, Valerie Obenchain <vobencha at fhcrc.org
> <mailto:vobencha at fhcrc.org>> wrote:
>
> Hi Janet, Michael,
>
> Janet, were you able to get what you needed by specifying 'type'?
>
> Michael, which method are you referring to when you say it exists
> but is broken? The only method I see that allows a missing 'subject'
> is query="Vector". Maybe we need to add query="GRangesList"
> subject="missing"?
>
>
> Val
>
>
>
> On 06/11/2014 11:15 AM, Michael Lawrence wrote:
>
> Turns out I added something else. In this case, the method does
> exist, but
> it's just broken. You might have better luck if you specify the type
> argument to findOverlaps, like type="any".
>
>
> On Wed, Jun 11, 2014 at 10:52 AM, Janet Young <jayoung at fhcrc.org
> <mailto:jayoung at fhcrc.org>> wrote:
>
> I'm glad that's on your radar already - nice. I updated all
> devel
> packages this morning - it's not there yet. Here's my new
> sessionInfo():
>
> sessionInfo()
>
> R version 3.1.0 Patched (2014-05-26 r65771)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils
> datasets methods
> [8] base
>
> other attached packages:
> [1] rtracklayer_1.25.11 GenomicRanges_1.17.18
> GenomeInfoDb_1.1.7
> [4] IRanges_1.99.15 S4Vectors_0.0.8
> BiocGenerics_0.11.2
>
> loaded via a namespace (and not attached):
> [1] BatchJobs_1.2 BBmisc_1.6
> BiocParallel_0.7.2
>
> [4] Biostrings_2.33.10 bitops_1.0-6
> brew_1.0-6
>
> [7] codetools_0.2-8 DBI_0.2-7
> digest_0.6.4
>
> [10] fail_1.2 foreach_1.4.2
> GenomicAlignments_1.1.14
> [13] iterators_1.0.7 plyr_1.8.1
> Rcpp_0.11.2
>
> [16] RCurl_1.95-4.1 Rsamtools_1.17.25
> RSQLite_0.11.4
>
> [19] sendmailR_1.1-2 stats4_3.1.0
> stringr_0.6.2
>
> [22] tools_3.1.0 XML_3.98-1.1
> XVector_0.5.6
>
> [25] zlibbioc_1.11.1
>
> On Jun 10, 2014, at 3:31 PM, Michael Lawrence
> <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
> wrote:
>
> I think I might have already added this a couple days ago to
> devel...
> tough to keep it all straight in my head
>
>
> On Tue, Jun 10, 2014 at 3:14 PM, Janet Young
> <jayoung at fhcrc.org <mailto:jayoung at fhcrc.org>> wrote:
>
> Hi again,
>
> This is somewhat related to another request I sent
> earlier this
> afternoon. Is it possible to implement query-self
> comparisons (i.e. where I
> specify query but not subject) for GRangesList objects?
> The motivation is
> that I'd like to use the ignoreSelf and ignoreRedundant
> options in a
> GRangesList comparison.
>
> I mentioned in my other request that I'm looking through
> a set of genes
> to find pairs that overlap on opposite strands. I'm now
> using findOverlaps
> on a GRangesList object instead of a GRanges object -
> that's because I want
> to only look at cases where parts of the final spliced
> transcript overlap,
> not cases where a large intron-containing gene has
> another smaller gene
> nested in an intron. When I used findOverlaps on the
> entire genes at once
> as GRanges objects, my results included pairs of that
> nested type, but if I
> use the blocks function to get just the exons as a
> GRangesList object, that
> lets me successfully ignore the nested gene case, which
> is great. However,
> with GRangesList I can't use the query-self comparison
> and therefore can't
> access those useful ignoreSelf and ignoreRedundant
> options. I know I can
> workaround that too with some effort, but it'd be great
> to have it as part
> of the underlying code.
>
> Again, I've included some code below that should show
> what I'm trying to
> do.
>
> all the best,
>
> Janet
>
>
> library(rtracklayer)
>
> #### get some drosophila genes as a test case:
> mySession <- browserSession()
> genome(mySession) <- "dm3"
> genes <- ucscTableQuery (mySession, track="flyBaseGene",
> table="flyBaseGene")
> genes <- track(genes)
>
> #### reduce to a smaller example that contains a gene
> pair of the type
> I'm talking about (CG33797-RA is nested inside CG11152-RA)
> genes <- genes[148:152]
>
> #### remove strand info, as a workaround to not being
> able to specify
> ignore.strand for a query-self findOverlaps call
> strand(genes) <- "*"
>
> #### using findOverlaps on the genes themselves shows me
> the nested pair
> (query=3, subject=4)
> findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant
> = TRUE)
>
> #### so I'll use blocks to extract only the exonic
> portions of the genes
> as a GRangesList:
> genes_GRL <- blocks(genes)
>
> #### and use findOverlaps on that GRangesList object,
> first by specifying
> it as both query and subject in the comparison - this
> gives me more or less
> what I want (i.e. it does NOT show the nested pair 3-4),
> except that
> there's a bunch of filtering to do later.
> findOverlaps( genes_GRL, genes_GRL)
>
> #### ideally, to help me filter the hits I'd like to be
> able to use
> ignoreSelf and ignoreRedundant, but I can only use those
> if it's a
> query-self comparison (i.e. only works if no subject is
> specified)
> findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE,
> ignoreRedundant=TRUE)
> # Error in .local(query, subject, maxgap, minoverlap,
> type, select, ...) :
> # unused arguments (ignoreSelf = TRUE, ignoreRedundant
> = TRUE)
>
> #### and it looks like findOverlaps is not implemented
> for the query-self
> case for GRangesList objects
> findOverlaps( genes_GRL)
> # Error in match.arg(type) : 'arg' must be of length 1
>
> sessionInfo()
>
> R version 3.1.0 Patched (2014-05-26 r65771)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils
> datasets methods
> [8] base
>
> other attached packages:
> [1] rtracklayer_1.25.11 GenomicRanges_1.17.17
> GenomeInfoDb_1.1.6
> [4] IRanges_1.99.15 S4Vectors_0.0.8
> BiocGenerics_0.11.2
>
> loaded via a namespace (and not attached):
> [1] BatchJobs_1.2 BBmisc_1.6
> BiocParallel_0.7.2
> [4] Biostrings_2.33.10 bitops_1.0-6
> brew_1.0-6
> [7] codetools_0.2-8 DBI_0.2-7
> digest_0.6.4
> [10] fail_1.2 foreach_1.4.2
> GenomicAlignments_1.1.13
> [13] iterators_1.0.7 plyr_1.8.1
> Rcpp_0.11.2
> [16] RCurl_1.95-4.1 Rsamtools_1.17.23
> RSQLite_0.11.4
> [19] sendmailR_1.1-2 stats4_3.1.0
> stringr_0.6.2
> [22] tools_3.1.0 XML_3.98-1.1
> XVector_0.5.6
> [25] zlibbioc_1.11.1
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives:
> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives:
> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
More information about the Bioconductor
mailing list