[BioC] second GenomicRanges request - enable query-self comparison (and so ignoreSelf and ignoreRedundant) for GRangesList objects?
Janet Young
jayoung at fhcrc.org
Fri Jun 20 19:58:52 CEST 2014
thanks very much - got it now, and it's working without the type="any". Nice!
j
On Jun 18, 2014, at 1:50 PM, Valerie Obenchain <vobencha at fhcrc.org> wrote:
> I think Michael forgot to bump the version after his fix so the package didn't propagate. It's bumped now and 1.99.16 should be available w/ biocLite() by noon tomorrow or in svn immediately.
>
> findOverlaps() now works on a GRangesList w/ no subject and the ignore* args as necessary. The default for 'type' is "any" for all findOverlaps() methods.
>
> example(GRangesList) ## for 'grl'
>>> findOverlaps(grl)
>> Hits of length 3
>> queryLength: 3
>> subjectLength: 3
>> queryHits subjectHits
>> <integer> <integer>
>> 1 1 1
>> 2 2 2
>> 3 3 3
>
>
> Valerie
>
> On 06/17/2014 11:54 AM, Janet Young wrote:
>> Thanks very much all - yes, I can now get what I want by adding type="any":
>> findOverlaps( genes_GRL, type="any", ignoreSelf=TRUE, ignoreRedundant=TRUE)
>> I guess in the long term that'll be unnecessary?
> It seems to make sense to have a default of type="any", to be consistent with all the other findOverlaps methods.
>>
>> Janet
>>
>>
>>
>>
>> On Jun 15, 2014, at 12:35 PM, Valerie Obenchain <vobencha at fhcrc.org> wrote:
>>
>>> Great. Thanks.
>>>
>>> Val
>>>
>>> On 06/14/2014 10:27 PM, Michael Lawrence wrote:
>>>> A GRangesList is a Vector, so that method should work. It just neglected
>>>> to resolve the type argument with match.arg() before forwarding. That's
>>>> now fixed.
>>>>
>>>> Michael
>>>>
>>>>
>>>> On Sat, Jun 14, 2014 at 6:27 PM, Valerie Obenchain <vobencha at fhcrc.org
>>>> <mailto:vobencha at fhcrc.org>> wrote:
>>>>
>>>> Hi Janet, Michael,
>>>>
>>>> Janet, were you able to get what you needed by specifying 'type'?
>>>>
>>>> Michael, which method are you referring to when you say it exists
>>>> but is broken? The only method I see that allows a missing 'subject'
>>>> is query="Vector". Maybe we need to add query="GRangesList"
>>>> subject="missing"?
>>>>
>>>>
>>>> Val
>>>>
>>>>
>>>>
>>>> On 06/11/2014 11:15 AM, Michael Lawrence wrote:
>>>>
>>>> Turns out I added something else. In this case, the method does
>>>> exist, but
>>>> it's just broken. You might have better luck if you specify the type
>>>> argument to findOverlaps, like type="any".
>>>>
>>>>
>>>> On Wed, Jun 11, 2014 at 10:52 AM, Janet Young <jayoung at fhcrc.org
>>>> <mailto:jayoung at fhcrc.org>> wrote:
>>>>
>>>> I'm glad that's on your radar already - nice. I updated all
>>>> devel
>>>> packages this morning - it's not there yet. Here's my new
>>>> sessionInfo():
>>>>
>>>> sessionInfo()
>>>>
>>>> R version 3.1.0 Patched (2014-05-26 r65771)
>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] parallel stats graphics grDevices utils
>>>> datasets methods
>>>> [8] base
>>>>
>>>> other attached packages:
>>>> [1] rtracklayer_1.25.11 GenomicRanges_1.17.18
>>>> GenomeInfoDb_1.1.7
>>>> [4] IRanges_1.99.15 S4Vectors_0.0.8
>>>> BiocGenerics_0.11.2
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] BatchJobs_1.2 BBmisc_1.6
>>>> BiocParallel_0.7.2
>>>>
>>>> [4] Biostrings_2.33.10 bitops_1.0-6
>>>> brew_1.0-6
>>>>
>>>> [7] codetools_0.2-8 DBI_0.2-7
>>>> digest_0.6.4
>>>>
>>>> [10] fail_1.2 foreach_1.4.2
>>>> GenomicAlignments_1.1.14
>>>> [13] iterators_1.0.7 plyr_1.8.1
>>>> Rcpp_0.11.2
>>>>
>>>> [16] RCurl_1.95-4.1 Rsamtools_1.17.25
>>>> RSQLite_0.11.4
>>>>
>>>> [19] sendmailR_1.1-2 stats4_3.1.0
>>>> stringr_0.6.2
>>>>
>>>> [22] tools_3.1.0 XML_3.98-1.1
>>>> XVector_0.5.6
>>>>
>>>> [25] zlibbioc_1.11.1
>>>>
>>>> On Jun 10, 2014, at 3:31 PM, Michael Lawrence
>>>> <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
>>>> wrote:
>>>>
>>>> I think I might have already added this a couple days ago to
>>>> devel...
>>>> tough to keep it all straight in my head
>>>>
>>>>
>>>> On Tue, Jun 10, 2014 at 3:14 PM, Janet Young
>>>> <jayoung at fhcrc.org <mailto:jayoung at fhcrc.org>> wrote:
>>>>
>>>> Hi again,
>>>>
>>>> This is somewhat related to another request I sent
>>>> earlier this
>>>> afternoon. Is it possible to implement query-self
>>>> comparisons (i.e. where I
>>>> specify query but not subject) for GRangesList objects?
>>>> The motivation is
>>>> that I'd like to use the ignoreSelf and ignoreRedundant
>>>> options in a
>>>> GRangesList comparison.
>>>>
>>>> I mentioned in my other request that I'm looking through
>>>> a set of genes
>>>> to find pairs that overlap on opposite strands. I'm now
>>>> using findOverlaps
>>>> on a GRangesList object instead of a GRanges object -
>>>> that's because I want
>>>> to only look at cases where parts of the final spliced
>>>> transcript overlap,
>>>> not cases where a large intron-containing gene has
>>>> another smaller gene
>>>> nested in an intron. When I used findOverlaps on the
>>>> entire genes at once
>>>> as GRanges objects, my results included pairs of that
>>>> nested type, but if I
>>>> use the blocks function to get just the exons as a
>>>> GRangesList object, that
>>>> lets me successfully ignore the nested gene case, which
>>>> is great. However,
>>>> with GRangesList I can't use the query-self comparison
>>>> and therefore can't
>>>> access those useful ignoreSelf and ignoreRedundant
>>>> options. I know I can
>>>> workaround that too with some effort, but it'd be great
>>>> to have it as part
>>>> of the underlying code.
>>>>
>>>> Again, I've included some code below that should show
>>>> what I'm trying to
>>>> do.
>>>>
>>>> all the best,
>>>>
>>>> Janet
>>>>
>>>>
>>>> library(rtracklayer)
>>>>
>>>> #### get some drosophila genes as a test case:
>>>> mySession <- browserSession()
>>>> genome(mySession) <- "dm3"
>>>> genes <- ucscTableQuery (mySession, track="flyBaseGene",
>>>> table="flyBaseGene")
>>>> genes <- track(genes)
>>>>
>>>> #### reduce to a smaller example that contains a gene
>>>> pair of the type
>>>> I'm talking about (CG33797-RA is nested inside CG11152-RA)
>>>> genes <- genes[148:152]
>>>>
>>>> #### remove strand info, as a workaround to not being
>>>> able to specify
>>>> ignore.strand for a query-self findOverlaps call
>>>> strand(genes) <- "*"
>>>>
>>>> #### using findOverlaps on the genes themselves shows me
>>>> the nested pair
>>>> (query=3, subject=4)
>>>> findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant
>>>> = TRUE)
>>>>
>>>> #### so I'll use blocks to extract only the exonic
>>>> portions of the genes
>>>> as a GRangesList:
>>>> genes_GRL <- blocks(genes)
>>>>
>>>> #### and use findOverlaps on that GRangesList object,
>>>> first by specifying
>>>> it as both query and subject in the comparison - this
>>>> gives me more or less
>>>> what I want (i.e. it does NOT show the nested pair 3-4),
>>>> except that
>>>> there's a bunch of filtering to do later.
>>>> findOverlaps( genes_GRL, genes_GRL)
>>>>
>>>> #### ideally, to help me filter the hits I'd like to be
>>>> able to use
>>>> ignoreSelf and ignoreRedundant, but I can only use those
>>>> if it's a
>>>> query-self comparison (i.e. only works if no subject is
>>>> specified)
>>>> findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE,
>>>> ignoreRedundant=TRUE)
>>>> # Error in .local(query, subject, maxgap, minoverlap,
>>>> type, select, ...) :
>>>> # unused arguments (ignoreSelf = TRUE, ignoreRedundant
>>>> = TRUE)
>>>>
>>>> #### and it looks like findOverlaps is not implemented
>>>> for the query-self
>>>> case for GRangesList objects
>>>> findOverlaps( genes_GRL)
>>>> # Error in match.arg(type) : 'arg' must be of length 1
>>>>
>>>> sessionInfo()
>>>>
>>>> R version 3.1.0 Patched (2014-05-26 r65771)
>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] parallel stats graphics grDevices utils
>>>> datasets methods
>>>> [8] base
>>>>
>>>> other attached packages:
>>>> [1] rtracklayer_1.25.11 GenomicRanges_1.17.17
>>>> GenomeInfoDb_1.1.6
>>>> [4] IRanges_1.99.15 S4Vectors_0.0.8
>>>> BiocGenerics_0.11.2
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] BatchJobs_1.2 BBmisc_1.6
>>>> BiocParallel_0.7.2
>>>> [4] Biostrings_2.33.10 bitops_1.0-6
>>>> brew_1.0-6
>>>> [7] codetools_0.2-8 DBI_0.2-7
>>>> digest_0.6.4
>>>> [10] fail_1.2 foreach_1.4.2
>>>> GenomicAlignments_1.1.13
>>>> [13] iterators_1.0.7 plyr_1.8.1
>>>> Rcpp_0.11.2
>>>> [16] RCurl_1.95-4.1 Rsamtools_1.17.23
>>>> RSQLite_0.11.4
>>>> [19] sendmailR_1.1-2 stats4_3.1.0
>>>> stringr_0.6.2
>>>> [22] tools_3.1.0 XML_3.98-1.1
>>>> XVector_0.5.6
>>>> [25] zlibbioc_1.11.1
>>>>
>>>> _________________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> <mailto:Bioconductor at r-project.org>
>>>> https://stat.ethz.ch/mailman/__listinfo/bioconductor
>>>> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>>>> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>>
>>>> _________________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>>> https://stat.ethz.ch/mailman/__listinfo/bioconductor
>>>> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>>>> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>>
>>>>
>>>>
>>>
>>
>
More information about the Bioconductor
mailing list