[BioC] second GenomicRanges request - enable query-self comparison (and so ignoreSelf and ignoreRedundant) for GRangesList objects?
Valerie Obenchain
vobencha at fhcrc.org
Wed Jun 18 22:50:22 CEST 2014
I think Michael forgot to bump the version after his fix so the package
didn't propagate. It's bumped now and 1.99.16 should be available w/
biocLite() by noon tomorrow or in svn immediately.
findOverlaps() now works on a GRangesList w/ no subject and the ignore*
args as necessary. The default for 'type' is "any" for all
findOverlaps() methods.
example(GRangesList) ## for 'grl'
>> findOverlaps(grl)
> Hits of length 3
> queryLength: 3
> subjectLength: 3
> queryHits subjectHits
> <integer> <integer>
> 1 1 1
> 2 2 2
> 3 3 3
Valerie
On 06/17/2014 11:54 AM, Janet Young wrote:
> Thanks very much all - yes, I can now get what I want by adding type="any":
> findOverlaps( genes_GRL, type="any", ignoreSelf=TRUE, ignoreRedundant=TRUE)
> I guess in the long term that'll be unnecessary?
It seems to make sense to have a default of type="any", to be
consistent with all the other findOverlaps methods.
>
> Janet
>
>
>
>
> On Jun 15, 2014, at 12:35 PM, Valerie Obenchain <vobencha at fhcrc.org> wrote:
>
>> Great. Thanks.
>>
>> Val
>>
>> On 06/14/2014 10:27 PM, Michael Lawrence wrote:
>>> A GRangesList is a Vector, so that method should work. It just neglected
>>> to resolve the type argument with match.arg() before forwarding. That's
>>> now fixed.
>>>
>>> Michael
>>>
>>>
>>> On Sat, Jun 14, 2014 at 6:27 PM, Valerie Obenchain <vobencha at fhcrc.org
>>> <mailto:vobencha at fhcrc.org>> wrote:
>>>
>>> Hi Janet, Michael,
>>>
>>> Janet, were you able to get what you needed by specifying 'type'?
>>>
>>> Michael, which method are you referring to when you say it exists
>>> but is broken? The only method I see that allows a missing 'subject'
>>> is query="Vector". Maybe we need to add query="GRangesList"
>>> subject="missing"?
>>>
>>>
>>> Val
>>>
>>>
>>>
>>> On 06/11/2014 11:15 AM, Michael Lawrence wrote:
>>>
>>> Turns out I added something else. In this case, the method does
>>> exist, but
>>> it's just broken. You might have better luck if you specify the type
>>> argument to findOverlaps, like type="any".
>>>
>>>
>>> On Wed, Jun 11, 2014 at 10:52 AM, Janet Young <jayoung at fhcrc.org
>>> <mailto:jayoung at fhcrc.org>> wrote:
>>>
>>> I'm glad that's on your radar already - nice. I updated all
>>> devel
>>> packages this morning - it's not there yet. Here's my new
>>> sessionInfo():
>>>
>>> sessionInfo()
>>>
>>> R version 3.1.0 Patched (2014-05-26 r65771)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel stats graphics grDevices utils
>>> datasets methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] rtracklayer_1.25.11 GenomicRanges_1.17.18
>>> GenomeInfoDb_1.1.7
>>> [4] IRanges_1.99.15 S4Vectors_0.0.8
>>> BiocGenerics_0.11.2
>>>
>>> loaded via a namespace (and not attached):
>>> [1] BatchJobs_1.2 BBmisc_1.6
>>> BiocParallel_0.7.2
>>>
>>> [4] Biostrings_2.33.10 bitops_1.0-6
>>> brew_1.0-6
>>>
>>> [7] codetools_0.2-8 DBI_0.2-7
>>> digest_0.6.4
>>>
>>> [10] fail_1.2 foreach_1.4.2
>>> GenomicAlignments_1.1.14
>>> [13] iterators_1.0.7 plyr_1.8.1
>>> Rcpp_0.11.2
>>>
>>> [16] RCurl_1.95-4.1 Rsamtools_1.17.25
>>> RSQLite_0.11.4
>>>
>>> [19] sendmailR_1.1-2 stats4_3.1.0
>>> stringr_0.6.2
>>>
>>> [22] tools_3.1.0 XML_3.98-1.1
>>> XVector_0.5.6
>>>
>>> [25] zlibbioc_1.11.1
>>>
>>> On Jun 10, 2014, at 3:31 PM, Michael Lawrence
>>> <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
>>> wrote:
>>>
>>> I think I might have already added this a couple days ago to
>>> devel...
>>> tough to keep it all straight in my head
>>>
>>>
>>> On Tue, Jun 10, 2014 at 3:14 PM, Janet Young
>>> <jayoung at fhcrc.org <mailto:jayoung at fhcrc.org>> wrote:
>>>
>>> Hi again,
>>>
>>> This is somewhat related to another request I sent
>>> earlier this
>>> afternoon. Is it possible to implement query-self
>>> comparisons (i.e. where I
>>> specify query but not subject) for GRangesList objects?
>>> The motivation is
>>> that I'd like to use the ignoreSelf and ignoreRedundant
>>> options in a
>>> GRangesList comparison.
>>>
>>> I mentioned in my other request that I'm looking through
>>> a set of genes
>>> to find pairs that overlap on opposite strands. I'm now
>>> using findOverlaps
>>> on a GRangesList object instead of a GRanges object -
>>> that's because I want
>>> to only look at cases where parts of the final spliced
>>> transcript overlap,
>>> not cases where a large intron-containing gene has
>>> another smaller gene
>>> nested in an intron. When I used findOverlaps on the
>>> entire genes at once
>>> as GRanges objects, my results included pairs of that
>>> nested type, but if I
>>> use the blocks function to get just the exons as a
>>> GRangesList object, that
>>> lets me successfully ignore the nested gene case, which
>>> is great. However,
>>> with GRangesList I can't use the query-self comparison
>>> and therefore can't
>>> access those useful ignoreSelf and ignoreRedundant
>>> options. I know I can
>>> workaround that too with some effort, but it'd be great
>>> to have it as part
>>> of the underlying code.
>>>
>>> Again, I've included some code below that should show
>>> what I'm trying to
>>> do.
>>>
>>> all the best,
>>>
>>> Janet
>>>
>>>
>>> library(rtracklayer)
>>>
>>> #### get some drosophila genes as a test case:
>>> mySession <- browserSession()
>>> genome(mySession) <- "dm3"
>>> genes <- ucscTableQuery (mySession, track="flyBaseGene",
>>> table="flyBaseGene")
>>> genes <- track(genes)
>>>
>>> #### reduce to a smaller example that contains a gene
>>> pair of the type
>>> I'm talking about (CG33797-RA is nested inside CG11152-RA)
>>> genes <- genes[148:152]
>>>
>>> #### remove strand info, as a workaround to not being
>>> able to specify
>>> ignore.strand for a query-self findOverlaps call
>>> strand(genes) <- "*"
>>>
>>> #### using findOverlaps on the genes themselves shows me
>>> the nested pair
>>> (query=3, subject=4)
>>> findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant
>>> = TRUE)
>>>
>>> #### so I'll use blocks to extract only the exonic
>>> portions of the genes
>>> as a GRangesList:
>>> genes_GRL <- blocks(genes)
>>>
>>> #### and use findOverlaps on that GRangesList object,
>>> first by specifying
>>> it as both query and subject in the comparison - this
>>> gives me more or less
>>> what I want (i.e. it does NOT show the nested pair 3-4),
>>> except that
>>> there's a bunch of filtering to do later.
>>> findOverlaps( genes_GRL, genes_GRL)
>>>
>>> #### ideally, to help me filter the hits I'd like to be
>>> able to use
>>> ignoreSelf and ignoreRedundant, but I can only use those
>>> if it's a
>>> query-self comparison (i.e. only works if no subject is
>>> specified)
>>> findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE,
>>> ignoreRedundant=TRUE)
>>> # Error in .local(query, subject, maxgap, minoverlap,
>>> type, select, ...) :
>>> # unused arguments (ignoreSelf = TRUE, ignoreRedundant
>>> = TRUE)
>>>
>>> #### and it looks like findOverlaps is not implemented
>>> for the query-self
>>> case for GRangesList objects
>>> findOverlaps( genes_GRL)
>>> # Error in match.arg(type) : 'arg' must be of length 1
>>>
>>> sessionInfo()
>>>
>>> R version 3.1.0 Patched (2014-05-26 r65771)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel stats graphics grDevices utils
>>> datasets methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] rtracklayer_1.25.11 GenomicRanges_1.17.17
>>> GenomeInfoDb_1.1.6
>>> [4] IRanges_1.99.15 S4Vectors_0.0.8
>>> BiocGenerics_0.11.2
>>>
>>> loaded via a namespace (and not attached):
>>> [1] BatchJobs_1.2 BBmisc_1.6
>>> BiocParallel_0.7.2
>>> [4] Biostrings_2.33.10 bitops_1.0-6
>>> brew_1.0-6
>>> [7] codetools_0.2-8 DBI_0.2-7
>>> digest_0.6.4
>>> [10] fail_1.2 foreach_1.4.2
>>> GenomicAlignments_1.1.13
>>> [13] iterators_1.0.7 plyr_1.8.1
>>> Rcpp_0.11.2
>>> [16] RCurl_1.95-4.1 Rsamtools_1.17.23
>>> RSQLite_0.11.4
>>> [19] sendmailR_1.1-2 stats4_3.1.0
>>> stringr_0.6.2
>>> [22] tools_3.1.0 XML_3.98-1.1
>>> XVector_0.5.6
>>> [25] zlibbioc_1.11.1
>>>
>>> _________________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> <mailto:Bioconductor at r-project.org>
>>> https://stat.ethz.ch/mailman/__listinfo/bioconductor
>>> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>> Search the archives:
>>> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>>> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>
>>>
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>>
>>> _________________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>> https://stat.ethz.ch/mailman/__listinfo/bioconductor
>>> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>> Search the archives:
>>> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>>> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>
>>>
>>>
>>
>
More information about the Bioconductor
mailing list