[BioC] GenomicRanges:::.similarSeqnameConvention regular expressions needs some tweaking?
Hervé Pagès
hpages at fhcrc.org
Sat Aug 28 04:40:13 CEST 2010
Hi Steve,
Until we come up with the perfect heuristic for this, I've modified
findOverlaps() and its related methods so that they issue a warning
instead of an error when 'query' and 'subject' don't appear to use
a similar seqname convention.
This is in GenomicRanges version 1.0.9 (release) and 1.1.23 (devel).
Cheers,
H.
On 08/24/2010 09:35 AM, Steve Lianoglou wrote:
> Hi Martin,
>
> On Tue, Aug 24, 2010 at 12:14 PM, Martin Morgan<mtmorgan at fhcrc.org> wrote:
>> On 08/24/2010 08:16 AM, Steve Lianoglou wrote:
>>> Hi,
>>>
>>> Sorry to be a pest about this, but could we get some traction on this?
>>>
>>> I've temporarily commented out the isArabic regex test to get around
>>> this issue as a work around, but want to keep my own/analysis code in
>>> line w/ the real GenomicRanges package.
>>
>> We've discussed this locally and will make changes this week. Martin
>
> Sweet.
>
> Thanks Martin (+ co),
>
> -steve
>
>>
>>>
>>> Thanks,
>>> -steve
>>>
>>>
>>> On Fri, Aug 20, 2010 at 12:53 PM, Steve Lianoglou
>>> <mailinglist.honeypot at gmail.com> wrote:
>>>> Hi all,
>>>>
>>>> The GenomicRanges:::.similarSeqnameConvention function is returning
>>>> FALSE where, IMHO, it shouldn't be.
>>>>
>>>> I've landed in a situation where this function is called with the
>>>> following values for seqs1/2:
>>>>
>>>> seqs1:
>>>> [1] "chr1" "chr1_random" "chr10" "chr10_random"
>>>> "chr11" "chr11_random"
>>>> [7] "chr12" "chr13" "chr13_random" "chr14"
>>>> "chr15" "chr15_random"
>>>> [13] "chr16" "chr16_random" "chr17" "chr17_random"
>>>> "chr18" "chr18_random"
>>>> [19] "chr19" "chr19_random" "chr2" "chr2_random"
>>>> "chr20" "chr21"
>>>> [25] "chr21_random" "chr22" "chr22_random" "chr22_h2_hap1"
>>>> "chr3" "chr3_random"
>>>> [31] "chr4" "chr4_random" "chr5" "chr5_random"
>>>> "chr5_h2_hap1" "chr6"
>>>> [37] "chr6_random" "chr6_cox_hap1" "chr6_qbl_hap2" "chr7"
>>>> "chr7_random" "chr8"
>>>> [43] "chr8_random" "chr9" "chr9_random" "chrM"
>>>> "chrX" "chrX_random"
>>>> [49] "chrY"
>>>>
>>>> seqs2:
>>>> [1] "chrY"
>>>>
>>>> and it looks like the "isArabic" function in funList is the culprit
>>>> here. Perhaps this regex test is so necessary, given all the other
>>>> tests that are being run?.
>>>>
>>>> I guess it's not so easy to come up w/ a perfect heuristic for this
>>>> function to check "comparable seqnames", but IMHO, it seems as if my
>>>> scenario should pass as a "good" (ie. the conventions are similar).
>>>>
>>>> Another scenario would be to just have this function return TRUE when
>>>> the intersection between seqs1 and seqs2 is length 0. I guess that
>>>> must be too simple though ...
>>>>
>>>> --
>>>> Steve Lianoglou
>>>> Graduate Student: Computational Systems Biology
>>>> | Memorial Sloan-Kettering Cancer Center
>>>> | Weill Medical College of Cornell University
>>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>>
>>>
>>>
>>>
>>
>>
>> --
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
>
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list