[BioC] second GenomicRanges request - enable query-self comparison (and so ignoreSelf and ignoreRedundant) for GRangesList objects?

Janet Young jayoung at fhcrc.org
Fri Jun 20 19:58:52 CEST 2014


thanks very much - got it now, and it's working without the type="any".   Nice!

j

On Jun 18, 2014, at 1:50 PM, Valerie Obenchain <vobencha at fhcrc.org> wrote:

> I think Michael forgot to bump the version after his fix so the package didn't propagate. It's bumped now and 1.99.16 should be available w/ biocLite() by noon tomorrow or in svn immediately.
> 
> findOverlaps() now works on a GRangesList w/ no subject and the ignore* args as necessary. The default for 'type' is "any" for all findOverlaps() methods.
> 
> example(GRangesList) ## for 'grl'
>>> findOverlaps(grl)
>> Hits of length 3
>> queryLength: 3
>> subjectLength: 3
>>  queryHits subjectHits
>>   <integer>   <integer>
>> 1         1           1
>> 2         2           2
>> 3         3           3
> 
> 
> Valerie
> 
> On 06/17/2014 11:54 AM, Janet Young wrote:
>> Thanks very much all - yes, I can now get what I want by adding type="any":
>>      findOverlaps( genes_GRL, type="any", ignoreSelf=TRUE, ignoreRedundant=TRUE)
>> I guess in the long term that'll be unnecessary?
>  It seems to make sense to have a default of type="any", to be consistent with all the other findOverlaps methods.
>> 
>> Janet
>> 
>> 
>> 
>> 
>> On Jun 15, 2014, at 12:35 PM, Valerie Obenchain <vobencha at fhcrc.org> wrote:
>> 
>>> Great. Thanks.
>>> 
>>> Val
>>> 
>>> On 06/14/2014 10:27 PM, Michael Lawrence wrote:
>>>> A GRangesList is a Vector, so that method should work. It just neglected
>>>> to resolve the type argument with match.arg() before forwarding. That's
>>>> now fixed.
>>>> 
>>>> Michael
>>>> 
>>>> 
>>>> On Sat, Jun 14, 2014 at 6:27 PM, Valerie Obenchain <vobencha at fhcrc.org
>>>> <mailto:vobencha at fhcrc.org>> wrote:
>>>> 
>>>>    Hi Janet, Michael,
>>>> 
>>>>    Janet, were you able to get what you needed by specifying 'type'?
>>>> 
>>>>    Michael, which method are you referring to when you say it exists
>>>>    but is broken? The only method I see that allows a missing 'subject'
>>>>    is query="Vector". Maybe we need to add query="GRangesList"
>>>>    subject="missing"?
>>>> 
>>>> 
>>>>    Val
>>>> 
>>>> 
>>>> 
>>>>    On 06/11/2014 11:15 AM, Michael Lawrence wrote:
>>>> 
>>>>        Turns out I added something else. In this case, the method does
>>>>        exist, but
>>>>        it's just broken. You might have better luck if you specify the type
>>>>        argument to findOverlaps, like type="any".
>>>> 
>>>> 
>>>>        On Wed, Jun 11, 2014 at 10:52 AM, Janet Young <jayoung at fhcrc.org
>>>>        <mailto:jayoung at fhcrc.org>> wrote:
>>>> 
>>>>            I'm glad that's on your radar already - nice.  I updated all
>>>>            devel
>>>>            packages this morning - it's not there yet.  Here's my new
>>>>            sessionInfo():
>>>> 
>>>>                sessionInfo()
>>>> 
>>>>            R version 3.1.0 Patched (2014-05-26 r65771)
>>>>            Platform: x86_64-unknown-linux-gnu (64-bit)
>>>> 
>>>>            locale:
>>>>               [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>               [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>               [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>>               [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>               [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>            [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>> 
>>>>            attached base packages:
>>>>            [1] parallel  stats     graphics  grDevices utils
>>>>            datasets  methods
>>>>            [8] base
>>>> 
>>>>            other attached packages:
>>>>            [1] rtracklayer_1.25.11   GenomicRanges_1.17.18
>>>>            GenomeInfoDb_1.1.7
>>>>            [4] IRanges_1.99.15       S4Vectors_0.0.8
>>>>            BiocGenerics_0.11.2
>>>> 
>>>>            loaded via a namespace (and not attached):
>>>>               [1] BatchJobs_1.2            BBmisc_1.6
>>>>            BiocParallel_0.7.2
>>>> 
>>>>               [4] Biostrings_2.33.10       bitops_1.0-6
>>>>            brew_1.0-6
>>>> 
>>>>               [7] codetools_0.2-8          DBI_0.2-7
>>>>              digest_0.6.4
>>>> 
>>>>            [10] fail_1.2                 foreach_1.4.2
>>>>            GenomicAlignments_1.1.14
>>>>            [13] iterators_1.0.7          plyr_1.8.1
>>>>            Rcpp_0.11.2
>>>> 
>>>>            [16] RCurl_1.95-4.1           Rsamtools_1.17.25
>>>>              RSQLite_0.11.4
>>>> 
>>>>            [19] sendmailR_1.1-2          stats4_3.1.0
>>>>            stringr_0.6.2
>>>> 
>>>>            [22] tools_3.1.0              XML_3.98-1.1
>>>>            XVector_0.5.6
>>>> 
>>>>            [25] zlibbioc_1.11.1
>>>> 
>>>>            On Jun 10, 2014, at 3:31 PM, Michael Lawrence
>>>>            <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
>>>>            wrote:
>>>> 
>>>>            I think I might have already added this a couple days ago to
>>>>            devel...
>>>>            tough to keep it all straight in my head
>>>> 
>>>> 
>>>>            On Tue, Jun 10, 2014 at 3:14 PM, Janet Young
>>>>            <jayoung at fhcrc.org <mailto:jayoung at fhcrc.org>> wrote:
>>>> 
>>>>                Hi again,
>>>> 
>>>>                This is somewhat related to another request I sent
>>>>                earlier this
>>>>                afternoon. Is it possible to implement query-self
>>>>                comparisons (i.e. where I
>>>>                specify query but not subject) for GRangesList objects?
>>>>                   The motivation is
>>>>                that I'd like to use the ignoreSelf and ignoreRedundant
>>>>                options in a
>>>>                GRangesList comparison.
>>>> 
>>>>                I mentioned in my other request that I'm looking through
>>>>                a set of genes
>>>>                to find pairs that overlap on opposite strands.  I'm now
>>>>                using findOverlaps
>>>>                on a GRangesList object instead of a GRanges object -
>>>>                that's because I want
>>>>                to only look at cases where parts of the final spliced
>>>>                transcript overlap,
>>>>                not cases where a large intron-containing gene has
>>>>                another smaller gene
>>>>                nested in an intron.  When I used findOverlaps on the
>>>>                entire genes at once
>>>>                as GRanges objects, my results included pairs of that
>>>>                nested type, but if I
>>>>                use the blocks function to get just the exons as a
>>>>                GRangesList object, that
>>>>                lets me successfully ignore the nested gene case, which
>>>>                is great.  However,
>>>>                with GRangesList I can't use the query-self comparison
>>>>                and therefore can't
>>>>                access those useful ignoreSelf and ignoreRedundant
>>>>                options.  I know I can
>>>>                workaround that too with some effort, but it'd be great
>>>>                to have it as part
>>>>                of the underlying code.
>>>> 
>>>>                Again, I've included some code below that should show
>>>>                what I'm trying to
>>>>                do.
>>>> 
>>>>                all the best,
>>>> 
>>>>                Janet
>>>> 
>>>> 
>>>>                library(rtracklayer)
>>>> 
>>>>                #### get some drosophila genes as a test case:
>>>>                mySession <- browserSession()
>>>>                genome(mySession) <- "dm3"
>>>>                genes <- ucscTableQuery (mySession, track="flyBaseGene",
>>>>                table="flyBaseGene")
>>>>                genes <- track(genes)
>>>> 
>>>>                #### reduce to a smaller example that contains a gene
>>>>                pair of the type
>>>>                I'm talking about (CG33797-RA is nested inside CG11152-RA)
>>>>                genes <- genes[148:152]
>>>> 
>>>>                #### remove strand info, as a workaround to not being
>>>>                able to specify
>>>>                ignore.strand for a query-self findOverlaps call
>>>>                strand(genes) <- "*"
>>>> 
>>>>                #### using findOverlaps on the genes themselves shows me
>>>>                the nested pair
>>>>                (query=3, subject=4)
>>>>                findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant
>>>>                = TRUE)
>>>> 
>>>>                #### so I'll use blocks to extract only the exonic
>>>>                portions of the genes
>>>>                as a GRangesList:
>>>>                genes_GRL <- blocks(genes)
>>>> 
>>>>                #### and use findOverlaps on that GRangesList object,
>>>>                first by specifying
>>>>                it as both query and subject in the comparison - this
>>>>                gives me more or less
>>>>                what I want (i.e. it does NOT show the nested pair 3-4),
>>>>                except that
>>>>                there's a bunch of filtering to do later.
>>>>                findOverlaps( genes_GRL, genes_GRL)
>>>> 
>>>>                #### ideally, to help me filter the hits I'd like to be
>>>>                able to use
>>>>                ignoreSelf and ignoreRedundant, but I can only use those
>>>>                if it's a
>>>>                query-self comparison (i.e. only works if no subject is
>>>>                specified)
>>>>                findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE,
>>>>                ignoreRedundant=TRUE)
>>>>                # Error in .local(query, subject, maxgap, minoverlap,
>>>>                type, select, ...) :
>>>>                #   unused arguments (ignoreSelf = TRUE, ignoreRedundant
>>>>                = TRUE)
>>>> 
>>>>                #### and it looks like findOverlaps is not implemented
>>>>                for the query-self
>>>>                case for GRangesList objects
>>>>                findOverlaps( genes_GRL)
>>>>                # Error in match.arg(type) : 'arg' must be of length 1
>>>> 
>>>>                sessionInfo()
>>>> 
>>>>                R version 3.1.0 Patched (2014-05-26 r65771)
>>>>                Platform: x86_64-unknown-linux-gnu (64-bit)
>>>> 
>>>>                locale:
>>>>                   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>                   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>                   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>>                   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>                   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>                [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>> 
>>>>                attached base packages:
>>>>                [1] parallel  stats     graphics  grDevices utils
>>>>                datasets  methods
>>>>                [8] base
>>>> 
>>>>                other attached packages:
>>>>                [1] rtracklayer_1.25.11   GenomicRanges_1.17.17
>>>>                GenomeInfoDb_1.1.6
>>>>                [4] IRanges_1.99.15       S4Vectors_0.0.8
>>>>                BiocGenerics_0.11.2
>>>> 
>>>>                loaded via a namespace (and not attached):
>>>>                   [1] BatchJobs_1.2            BBmisc_1.6
>>>>                BiocParallel_0.7.2
>>>>                   [4] Biostrings_2.33.10       bitops_1.0-6
>>>>                brew_1.0-6
>>>>                   [7] codetools_0.2-8          DBI_0.2-7
>>>>                  digest_0.6.4
>>>>                [10] fail_1.2                 foreach_1.4.2
>>>>                   GenomicAlignments_1.1.13
>>>>                [13] iterators_1.0.7          plyr_1.8.1
>>>>                Rcpp_0.11.2
>>>>                [16] RCurl_1.95-4.1           Rsamtools_1.17.23
>>>>                  RSQLite_0.11.4
>>>>                [19] sendmailR_1.1-2          stats4_3.1.0
>>>>                stringr_0.6.2
>>>>                [22] tools_3.1.0              XML_3.98-1.1
>>>>                XVector_0.5.6
>>>>                [25] zlibbioc_1.11.1
>>>> 
>>>>                _________________________________________________
>>>>                Bioconductor mailing list
>>>>                Bioconductor at r-project.org
>>>>                <mailto:Bioconductor at r-project.org>
>>>>                https://stat.ethz.ch/mailman/__listinfo/bioconductor
>>>>                <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>>                Search the archives:
>>>>                http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>>>>                <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>                 [[alternative HTML version deleted]]
>>>> 
>>>> 
>>>>        _________________________________________________
>>>>        Bioconductor mailing list
>>>>        Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>>>        https://stat.ethz.ch/mailman/__listinfo/bioconductor
>>>>        <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>>        Search the archives:
>>>>        http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>>>>        <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 



More information about the Bioconductor mailing list