I think I might have already added this a couple days ago to devel... tough
to keep it all straight in my head


On Tue, Jun 10, 2014 at 3:14 PM, Janet Young <jayoung@fhcrc.org> wrote:

> Hi again,
>
> This is somewhat related to another request I sent earlier this afternoon.
> Is it possible to implement query-self comparisons (i.e. where I specify
> query but not subject) for GRangesList objects?   The motivation is that
> I'd like to use the ignoreSelf and ignoreRedundant options in a GRangesList
> comparison.
>
> I mentioned in my other request that I'm looking through a set of genes to
> find pairs that overlap on opposite strands.  I'm now using findOverlaps on
> a GRangesList object instead of a GRanges object - that's because I want to
> only look at cases where parts of the final spliced transcript overlap, not
> cases where a large intron-containing gene has another smaller gene nested
> in an intron.  When I used findOverlaps on the entire genes at once as
> GRanges objects, my results included pairs of that nested type, but if I
> use the blocks function to get just the exons as a GRangesList object, that
> lets me successfully ignore the nested gene case, which is great.  However,
> with GRangesList I can't use the query-self comparison and therefore can't
> access those useful ignoreSelf and ignoreRedundant options.  I know I can
> workaround that too with some effort, but it'd be great to have it as part
> of the underlying code.
>
> Again, I've included some code below that should show what I'm trying to
> do.
>
> all the best,
>
> Janet
>
>
> library(rtracklayer)
>
> #### get some drosophila genes as a test case:
> mySession <- browserSession()
> genome(mySession) <- "dm3"
> genes <- ucscTableQuery (mySession, track="flyBaseGene",
> table="flyBaseGene")
> genes <- track(genes)
>
> #### reduce to a smaller example that contains a gene pair of the type I'm
> talking about (CG33797-RA is nested inside CG11152-RA)
> genes <- genes[148:152]
>
> #### remove strand info, as a workaround to not being able to specify
> ignore.strand for a query-self findOverlaps call
> strand(genes) <- "*"
>
> #### using findOverlaps on the genes themselves shows me the nested pair
> (query=3, subject=4)
> findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant = TRUE)
>
> #### so I'll use blocks to extract only the exonic portions of the genes
> as a GRangesList:
> genes_GRL <- blocks(genes)
>
> #### and use findOverlaps on that GRangesList object, first by specifying
> it as both query and subject in the comparison - this gives me more or less
> what I want (i.e. it does NOT show the nested pair 3-4), except that
> there's a bunch of filtering to do later.
> findOverlaps( genes_GRL, genes_GRL)
>
> #### ideally, to help me filter the hits I'd like to be able to use
> ignoreSelf and ignoreRedundant, but I can only use those if it's a
> query-self comparison (i.e. only works if no subject is specified)
> findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE, ignoreRedundant=TRUE)
> # Error in .local(query, subject, maxgap, minoverlap, type, select, ...) :
> #   unused arguments (ignoreSelf = TRUE, ignoreRedundant = TRUE)
>
> #### and it looks like findOverlaps is not implemented for the query-self
> case for GRangesList objects
> findOverlaps( genes_GRL)
> # Error in match.arg(type) : 'arg' must be of length 1
>
> sessionInfo()
>
> R version 3.1.0 Patched (2014-05-26 r65771)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] rtracklayer_1.25.11   GenomicRanges_1.17.17 GenomeInfoDb_1.1.6
> [4] IRanges_1.99.15       S4Vectors_0.0.8       BiocGenerics_0.11.2
>
> loaded via a namespace (and not attached):
>  [1] BatchJobs_1.2            BBmisc_1.6               BiocParallel_0.7.2
>  [4] Biostrings_2.33.10       bitops_1.0-6             brew_1.0-6
>  [7] codetools_0.2-8          DBI_0.2-7                digest_0.6.4
> [10] fail_1.2                 foreach_1.4.2
>  GenomicAlignments_1.1.13
> [13] iterators_1.0.7          plyr_1.8.1               Rcpp_0.11.2
> [16] RCurl_1.95-4.1           Rsamtools_1.17.23        RSQLite_0.11.4
> [19] sendmailR_1.1-2          stats4_3.1.0             stringr_0.6.2
> [22] tools_3.1.0              XML_3.98-1.1             XVector_0.5.6
> [25] zlibbioc_1.11.1
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

	[[alternative HTML version deleted]]

