[BioC] second GenomicRanges request - enable query-self comparison (and so ignoreSelf and ignoreRedundant) for GRangesList objects?
    Valerie Obenchain 
    vobencha at fhcrc.org
       
    Sun Jun 15 03:27:37 CEST 2014
    
    
  
Hi Janet, Michael,
Janet, were you able to get what you needed by specifying 'type'?
Michael, which method are you referring to when you say it exists but is 
broken? The only method I see that allows a missing 'subject' is 
query="Vector". Maybe we need to add query="GRangesList" subject="missing"?
Val
On 06/11/2014 11:15 AM, Michael Lawrence wrote:
> Turns out I added something else. In this case, the method does exist, but
> it's just broken. You might have better luck if you specify the type
> argument to findOverlaps, like type="any".
>
>
> On Wed, Jun 11, 2014 at 10:52 AM, Janet Young <jayoung at fhcrc.org> wrote:
>
>> I'm glad that's on your radar already - nice.  I updated all devel
>> packages this morning - it's not there yet.  Here's my new sessionInfo():
>>
>>> sessionInfo()
>> R version 3.1.0 Patched (2014-05-26 r65771)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] rtracklayer_1.25.11   GenomicRanges_1.17.18 GenomeInfoDb_1.1.7
>> [4] IRanges_1.99.15       S4Vectors_0.0.8       BiocGenerics_0.11.2
>>
>> loaded via a namespace (and not attached):
>>   [1] BatchJobs_1.2            BBmisc_1.6               BiocParallel_0.7.2
>>
>>   [4] Biostrings_2.33.10       bitops_1.0-6             brew_1.0-6
>>
>>   [7] codetools_0.2-8          DBI_0.2-7                digest_0.6.4
>>
>> [10] fail_1.2                 foreach_1.4.2
>> GenomicAlignments_1.1.14
>> [13] iterators_1.0.7          plyr_1.8.1               Rcpp_0.11.2
>>
>> [16] RCurl_1.95-4.1           Rsamtools_1.17.25        RSQLite_0.11.4
>>
>> [19] sendmailR_1.1-2          stats4_3.1.0             stringr_0.6.2
>>
>> [22] tools_3.1.0              XML_3.98-1.1             XVector_0.5.6
>>
>> [25] zlibbioc_1.11.1
>>
>> On Jun 10, 2014, at 3:31 PM, Michael Lawrence <lawrence.michael at gene.com>
>> wrote:
>>
>> I think I might have already added this a couple days ago to devel...
>> tough to keep it all straight in my head
>>
>>
>> On Tue, Jun 10, 2014 at 3:14 PM, Janet Young <jayoung at fhcrc.org> wrote:
>>
>>> Hi again,
>>>
>>> This is somewhat related to another request I sent earlier this
>>> afternoon. Is it possible to implement query-self comparisons (i.e. where I
>>> specify query but not subject) for GRangesList objects?   The motivation is
>>> that I'd like to use the ignoreSelf and ignoreRedundant options in a
>>> GRangesList comparison.
>>>
>>> I mentioned in my other request that I'm looking through a set of genes
>>> to find pairs that overlap on opposite strands.  I'm now using findOverlaps
>>> on a GRangesList object instead of a GRanges object - that's because I want
>>> to only look at cases where parts of the final spliced transcript overlap,
>>> not cases where a large intron-containing gene has another smaller gene
>>> nested in an intron.  When I used findOverlaps on the entire genes at once
>>> as GRanges objects, my results included pairs of that nested type, but if I
>>> use the blocks function to get just the exons as a GRangesList object, that
>>> lets me successfully ignore the nested gene case, which is great.  However,
>>> with GRangesList I can't use the query-self comparison and therefore can't
>>> access those useful ignoreSelf and ignoreRedundant options.  I know I can
>>> workaround that too with some effort, but it'd be great to have it as part
>>> of the underlying code.
>>>
>>> Again, I've included some code below that should show what I'm trying to
>>> do.
>>>
>>> all the best,
>>>
>>> Janet
>>>
>>>
>>> library(rtracklayer)
>>>
>>> #### get some drosophila genes as a test case:
>>> mySession <- browserSession()
>>> genome(mySession) <- "dm3"
>>> genes <- ucscTableQuery (mySession, track="flyBaseGene",
>>> table="flyBaseGene")
>>> genes <- track(genes)
>>>
>>> #### reduce to a smaller example that contains a gene pair of the type
>>> I'm talking about (CG33797-RA is nested inside CG11152-RA)
>>> genes <- genes[148:152]
>>>
>>> #### remove strand info, as a workaround to not being able to specify
>>> ignore.strand for a query-self findOverlaps call
>>> strand(genes) <- "*"
>>>
>>> #### using findOverlaps on the genes themselves shows me the nested pair
>>> (query=3, subject=4)
>>> findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant = TRUE)
>>>
>>> #### so I'll use blocks to extract only the exonic portions of the genes
>>> as a GRangesList:
>>> genes_GRL <- blocks(genes)
>>>
>>> #### and use findOverlaps on that GRangesList object, first by specifying
>>> it as both query and subject in the comparison - this gives me more or less
>>> what I want (i.e. it does NOT show the nested pair 3-4), except that
>>> there's a bunch of filtering to do later.
>>> findOverlaps( genes_GRL, genes_GRL)
>>>
>>> #### ideally, to help me filter the hits I'd like to be able to use
>>> ignoreSelf and ignoreRedundant, but I can only use those if it's a
>>> query-self comparison (i.e. only works if no subject is specified)
>>> findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE, ignoreRedundant=TRUE)
>>> # Error in .local(query, subject, maxgap, minoverlap, type, select, ...) :
>>> #   unused arguments (ignoreSelf = TRUE, ignoreRedundant = TRUE)
>>>
>>> #### and it looks like findOverlaps is not implemented for the query-self
>>> case for GRangesList objects
>>> findOverlaps( genes_GRL)
>>> # Error in match.arg(type) : 'arg' must be of length 1
>>>
>>> sessionInfo()
>>>
>>> R version 3.1.0 Patched (2014-05-26 r65771)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] rtracklayer_1.25.11   GenomicRanges_1.17.17 GenomeInfoDb_1.1.6
>>> [4] IRanges_1.99.15       S4Vectors_0.0.8       BiocGenerics_0.11.2
>>>
>>> loaded via a namespace (and not attached):
>>>   [1] BatchJobs_1.2            BBmisc_1.6               BiocParallel_0.7.2
>>>   [4] Biostrings_2.33.10       bitops_1.0-6             brew_1.0-6
>>>   [7] codetools_0.2-8          DBI_0.2-7                digest_0.6.4
>>> [10] fail_1.2                 foreach_1.4.2
>>>   GenomicAlignments_1.1.13
>>> [13] iterators_1.0.7          plyr_1.8.1               Rcpp_0.11.2
>>> [16] RCurl_1.95-4.1           Rsamtools_1.17.23        RSQLite_0.11.4
>>> [19] sendmailR_1.1-2          stats4_3.1.0             stringr_0.6.2
>>> [22] tools_3.1.0              XML_3.98-1.1             XVector_0.5.6
>>> [25] zlibbioc_1.11.1
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
    
    
More information about the Bioconductor
mailing list