[Bioc-devel] [devteam-bioc] Error in findOverlaps

Ou, Jianhong Jianhong.Ou at umassmed.edu
Mon Jul 29 16:16:28 CEST 2013


Dear Hervé Pagès,

Thank you very much for your email. This make me clear about the exactly
mean of maxgap and minoverlap. Previous what I think is that maxgap is the
max gap between the both ends of query and subject. Now I understand this
only is the situation when we set the parameter type="all". And I checked
the findOverlaps of GenomicRanges, they keep same definition of start and
end for type. Great.

So as you say, to set minoverlap=1L is only suitable for type="any" and
"maxgap" is set to a value greater than 0.

Yours sincerely,

Jianhong Ou

LRB 670A
Program in Gene Function and Expression
364 Plantation Street Worcester,
MA 01605




On 7/26/13 2:48 AM, "Hervé Pagès" <hpages at fhcrc.org> wrote:

>
>
>On 07/25/2013 11:32 PM, Maintainer wrote:
>> Hi Jianhong,
>>
>> On 07/24/2013 08:18 AM, Ou, Jianhong wrote:
>>> Dear Hervé,
>>>
>>> Is it possible to ignore minoverlap when people set maxgap greater
>>>than 0
>>> and give a message for that?
>>
>> That would make sense to me but we're probably overlooking something
>> because Michael has a unit test where 'minoverlap' and 'maxgap' are
>> used together:
>>
>>     query <- IRanges(c(1, 5, 3, 4), width=c(2, 2, 4, 6))
>>     subject <- IRanges(c(1, 3, 5, 6), width=c(4, 4, 5, 4))
>>     tree <- IntervalTree(subject)
>>     result <- findOverlaps(query, tree, type = "start", maxgap = 1L,
>>                            minoverlap = 3L)
>>
>>     > result
>>     Hits of length 3
>>     queryLength: 4
>>     subjectLength: 4
>>       queryHits subjectHits
>>        <integer>   <integer>
>>      1         3           2
>>      2         4           2
>>      3         4           3
>>
>> This 'result' is what the unit test expects so we would need to
>> understand what the meaning of using those 2 arguments together
>> is (the man page is not really clear about it).
>
>Checking again and paying more attention to the man page, 'maxgap'
>has a special meaning when 'type' is not "any". If type="start" it
>specifies the maximum difference in the starts.
>
>So I guess the question that remains is what to do with 'minoverlap'
>when type="any" and 'maxgap' is set to a value greater than 0. I'm
>OK with your proposal of ignoring it with a warning (unless the
>user also specifies 'minoverlap=0L', then no warning of course).
>
>H.
>
>
>>
>> Michael?
>>
>> Thanks,
>> H.
>>
>>>
>>> Yours sincerely,
>>>
>>> Jianhong Ou
>>>
>>> LRB 670A
>>> Program in Gene Function and Expression
>>> 364 Plantation Street Worcester,
>>> MA 01605
>>>
>>>
>>>
>>>
>>> On 7/23/13 5:35 PM, "Hervé Pagès" <hpages at fhcrc.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> This fails in Bioc-devel only and the following fix "seems" to address
>>>> the problem:
>>>>
>>>>     hpages at thinkpad:~/svn/bioconductor/Rpacks/IRanges$ svn diff
>>>> R/findOverlaps-methods.R
>>>>     Index: R/findOverlaps-methods.R
>>>>     
>>>>===================================================================
>>>>     --- R/findOverlaps-methods.R	(revision 78801)
>>>>     +++ R/findOverlaps-methods.R	(working copy)
>>>>     @@ -142,7 +142,7 @@
>>>>                  # preprocess query
>>>>                  preprocRes <- .preProcess_findOverlaps_query(query,
>>>> maxgap, minoverlap)
>>>>                  origQuery <- preprocRes$origQuery
>>>>     -            unsortedQuery <- preprocRes$origQuery
>>>>     +            unsortedQuery <- preprocRes$unsortedQuery
>>>>                  query <- preprocRes$query
>>>>                  query_ord <- preprocRes$query_ord
>>>>
>>>> However a *real* fix would be to actually clarify what's the meaning
>>>>of
>>>> using 'maxgap' and 'minoverlap' together. Right now the result below
>>>> doesn't make much sense to me (using Bioc-release):
>>>>
>>>>     > findOverlaps(IRanges(1, 5), IRanges(7, 10), maxgap=3,
>>>>minoverlap=2)
>>>>     Hits of length 0
>>>>     queryLength: 1
>>>>     subjectLength: 1
>>>>
>>>>     > findOverlaps(IRanges(1, 5), IRanges(7, 10), maxgap=4,
>>>>minoverlap=2)
>>>>     Hits of length 1
>>>>     queryLength: 1
>>>>     subjectLength: 1
>>>>       queryHits subjectHits
>>>>        <integer>   <integer>
>>>>      1         1           1
>>>>
>>>> H.
>>>>
>>>>
>>>> On 07/23/2013 12:08 PM, Maintainer wrote:
>>>>> Hi,
>>>>>
>>>>> I got "negative widths are not allowed" error when I use
>>>>>findOverlaps.
>>>>> Could anybody help me to figure this out? Thanks. Here is the code:
>>>>>
>>>>>    > library("IRanges")
>>>>> Loading required package: BiocGenerics
>>>>> Loading required package: parallel
>>>>>
>>>>> Attaching package: ŒBiocGenerics¹
>>>>>
>>>>> The following objects are masked from Œpackage:parallel¹:
>>>>>
>>>>>        clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
>>>>>        clusterExport, clusterMap, parApply, parCapply, parLapply,
>>>>>        parLapplyLB, parRapply, parSapply, parSapplyLB
>>>>>
>>>>> The following object is masked from Œpackage:stats¹:
>>>>>
>>>>>        xtabs
>>>>>
>>>>> The following objects are masked from Œpackage:base¹:
>>>>>
>>>>>        anyDuplicated, append, as.data.frame, as.vector, cbind,
>>>>>colnames,
>>>>>        duplicated, eval, Filter, Find, get, intersect, lapply, Map,
>>>>>        mapply, match, mget, order, paste, pmax, pmax.int, pmin,
>>>>>pmin.int,
>>>>>        Position, rank, rbind, Reduce, rep.int, rownames, sapply,
>>>>>setdiff,
>>>>>        sort, table, tapply, union, unique, unlist
>>>>>
>>>>>    > Ranges1 <- sort(IRanges(start=c(3123260, 167889600),
>>>>>end=c(3123360,
>>>>> 167893599), names=c("Site5", "Site12")))
>>>>>    > Ranges2 <- sort(IRanges(start=c(312326, 3123260, 167888600),
>>>>> end=c(312586, 3123470, 167888999), names=c("t11", "t5", "t17")))
>>>>>    > tree = IntervalTree(Ranges2)
>>>>>    > findOverlaps(tree, query = Ranges1, maxgap = 1000, minoverlap =
>>>>>20,
>>>>> select = "first")
>>>>> Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE =
>>>>> "IRanges") :
>>>>>      solving row 2: negative widths are not allowed
>>>>>    > traceback()
>>>>> 11: .Call(.NAME, ..., PACKAGE = PACKAGE)
>>>>> 10: .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges")
>>>>> 9: solveUserSEW0(start = start, end = end, width = width)
>>>>> 8: IRanges(pmax.int(qstart, sstart), pmin.int(send, qend))
>>>>> 7: .local(x, ...)
>>>>> 6: ranges(result, unsortedQuery, subject)
>>>>> 5: ranges(result, unsortedQuery, subject)
>>>>> 4: .postProcess_findOverlaps_result(result, unsortedQuery, origQuery,
>>>>>           subject, type, minoverlap, maxgap, origSelect)
>>>>> 3: .local(query, subject, maxgap, minoverlap, type, select, ...)
>>>>> 2: findOverlaps(tree, query = Ranges1, maxgap = 1000, minoverlap =
>>>>>20,
>>>>>           select = "first")
>>>>> 1: findOverlaps(tree, query = Ranges1, maxgap = 1000, minoverlap =
>>>>>20,
>>>>>           select = "first")
>>>>>    > sessionInfo()
>>>>> R Under development (unstable) (2013-04-30 r62697)
>>>>> Platform: x86_64-apple-darwin12.3.0 (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>>
>>>>> attached base packages:
>>>>> [1] parallel  stats     graphics  grDevices utils     datasets
>>>>>methods
>>>>> [8] base
>>>>>
>>>>> other attached packages:
>>>>> [1] IRanges_1.19.19    BiocGenerics_0.7.3
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] stats4_3.1.0
>>>>>
>>>>>
>>>>> Yours sincerely,
>>>>>
>>>>> Jianhong Ou
>>>>>
>>>>> LRB 670A
>>>>> Program in Gene Function and Expression
>>>>> 364 Plantation Street Worcester,
>>>>> MA 01605
>>>>>
>>>>>
>>>>> 
>>>>>______________________________________________________________________
>>>>>__
>>>>> devteam-bioc mailing list
>>>>> To unsubscribe from this mailing list send a blank email to
>>>>> devteam-bioc-leave at lists.fhcrc.org
>>>>> You can also unsubscribe or change your personal options at
>>>>> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc
>>>>>
>>>>
>>>> --
>>>> Hervé Pagès
>>>>
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M1-B514
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>>
>>>> E-mail: hpages at fhcrc.org
>>>> Phone:  (206) 667-5791
>>>> Fax:    (206) 667-1319
>>>
>>
>
>-- 
>Hervé Pagès
>
>Program in Computational Biology
>Division of Public Health Sciences
>Fred Hutchinson Cancer Research Center
>1100 Fairview Ave. N, M1-B514
>P.O. Box 19024
>Seattle, WA 98109-1024
>
>E-mail: hpages at fhcrc.org
>Phone:  (206) 667-5791
>Fax:    (206) 667-1319



More information about the Bioc-devel mailing list