[Bioc-sig-seq] GRanges, failure assigning chromosome lengths

Ivan Gregoretti ivangreg at gmail.com
Mon Apr 26 23:17:06 CEST 2010


Hi Patrick.

Sure. Here is the output. To the best of my understanding, most
aligners assume that, for instance, 'ch4_random' is just a linear
continuation of 'chr4'.

Thank you,

Ivan


> range(A)
GRanges with 67 ranges and 0 elementMetadata values
         seqnames               ranges strand   |
            <Rle>            <IRanges>  <Rle>   |
 [1]         chr1 [3000311, 197195093]      +   |
 [2]         chr1 [2999944, 197195305]      -   |
 [3]        chr10 [3000180, 129992771]      +   |
 [4]        chr10 [3000092, 129993250]      -   |
 [5]        chr11 [3000006, 121843794]      +   |
 [6]        chr11 [3000010, 121843708]      -   |
 [7]        chr12 [3000095, 121257390]      +   |
 [8]        chr12 [3000144, 121257410]      -   |
 [9]        chr13 [3000434, 120284396]      +   |
 ...          ...                  ...    ... ...
[59] chrUn_random [    -21,   5900319]      -   |
[60]         chrX [3000042, 166627000]      +   |
[61]         chrX [2999944, 166650266]      -   |
[62]  chrX_random [    326,   1051594]      +   |
[63]  chrX_random [    229,   1309438]      -   |
[64]         chrY [    150,   2902454]      +   |
[65]         chrY [     -2,   2902181]      -   |
[66]  chrY_random [   7539,  58497416]      +   |
[67]  chrY_random [   1247,  58489658]      -   |

seqlengths
         chr1        chr10        chr11 ...         chrY  chrY_random
           NA           NA           NA ...           NA           NA


> range(A[seqnames(A)=='chrM'])
GRanges with 2 ranges and 0 elementMetadata values
    seqnames       ranges strand |
       <Rle>    <IRanges>  <Rle> |
[1]     chrM [  4, 16387]      + |
[2]     chrM [-82, 16298]      - |


> range(B)
GRanges with 65 ranges and 0 elementMetadata values
         seqnames               ranges strand   |
            <Rle>            <IRanges>  <Rle>   |
 [1]         chr1 [3000506, 197194916]      +   |
 [2]         chr1 [3000977, 197194382]      -   |
 [3]        chr10 [3000626, 129989507]      +   |
 [4]        chr10 [3000643, 129988346]      -   |
 [5]        chr11 [3000057, 121843783]      +   |
 [6]        chr11 [3000010, 121843699]      -   |
 [7]        chr12 [3001370, 121256864]      +   |
 [8]        chr12 [3000135, 121257132]      -   |
 [9]        chr13 [3000103, 120284379]      +   |
 ...          ...                  ...    ... ...
[57] chrUn_random [    125,   5900272]      -   |
[58]         chrX [3001345, 166614519]      +   |
[59]         chrX [3000031, 166628394]      -   |
[60]  chrX_random [   1177,   1051588]      +   |
[61]  chrX_random [    254,   1051506]      -   |
[62]         chrY [    141,   2902458]      +   |
[63]         chrY [    744,   2901128]      -   |
[64]  chrY_random [  39555,  58472198]      +   |
[65]  chrY_random [  71505,  58487987]      -   |

seqlengths
         chr1        chr10        chr11 ...         chrY  chrY_random
           NA           NA           NA ...           NA           NA


> range(B[seqnames(B)=='chrM'])
GRanges with 2 ranges and 0 elementMetadata values
    seqnames       ranges strand |
       <Rle>    <IRanges>  <Rle> |
[1]     chrM [  5, 16385]      + |
[2]     chrM [-77, 16299]      - |


> range(C)
GRanges with 67 ranges and 0 elementMetadata values
         seqnames               ranges strand   |
            <Rle>            <IRanges>  <Rle>   |
 [1]         chr1 [3000125, 197195029]      +   |
 [2]         chr1 [2999981, 197194997]      -   |
 [3]        chr10 [3000098, 129993293]      +   |
 [4]        chr10 [3000105, 129993219]      -   |
 [5]        chr11 [3000007, 121843789]      +   |
 [6]        chr11 [3000010, 121843707]      -   |
 [7]        chr12 [3000113, 121257571]      +   |
 [8]        chr12 [3000011, 121257521]      -   |
 [9]        chr13 [3000163, 120284384]      +   |
 ...          ...                  ...    ... ...
[59] chrUn_random [    -75,   5900330]      -   |
[60]         chrX [3000056, 166650160]      +   |
[61]         chrX [2999951, 166650076]      -   |
[62]  chrX_random [    327,   1200747]      +   |
[63]  chrX_random [    236,   1508524]      -   |
[64]         chrY [     38,   2902563]      +   |
[65]         chrY [    -80,   2902221]      -   |
[66]  chrY_random [   5699,  58492209]      +   |
[67]  chrY_random [   7154,  58486839]      -   |

seqlengths
         chr1        chr10        chr11 ...         chrY  chrY_random
           NA           NA           NA ...           NA           NA


> range(C[seqnames(C)=='chrM'])
GRanges with 2 ranges and 0 elementMetadata values
    seqnames       ranges strand |
       <Rle>    <IRanges>  <Rle> |
[1]     chrM [  1, 16387]      + |
[2]     chrM [-81, 16332]      - |


library(BSgenome.Mmusculus.UCSC.mm9)
> length(Mmusculus$chrM)
[1] 16299



Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health



On Mon, Apr 26, 2010 at 4:35 PM, Patrick Aboyoun <paboyoun at fhcrc.org> wrote:
> Ivan,
> Could you provide me with the results of
>
> range(Z)
>
> for all three of the GRanges objects that seqlengths<- throws and error on?
> I would like to see if there is some adjustment we can make to the
> seqlengths<- function that will resolve your issue.
>
> Thanks,
> Patrick
>
>
> On 4/26/10 12:57 PM, Ivan Gregoretti wrote:
>>
>> Hi Patrick,
>>
>> You are correct. The validity check is rejecting the input. The
>> problem with that is that I have three set, all of them failing perhap
>> because one or two tags are out of bounds.
>>
>> This tags are coming straight from the Illumina sequencer.  There is
>> no manipulation. Perhaps it is complaining because the validity check
>> does not know that Mitochondrial DNA is circular.
>>
>> Bottom line, anybody attempting to load the data as GRanges from a
>> high coverage tag set will find this seqlengths() rejection.
>>
>> Is there any way to override it or should I just refrain from
>> assigning chromosome lengths? The online vignette does not mention if
>> there is a switch to the validity check.
>>
>> Thank you,
>>
>> Ivan
>>
>>
>>
>> Ivan Gregoretti, PhD
>> National Institute of Diabetes and Digestive and Kidney Diseases
>> National Institutes of Health
>> 5 Memorial Dr, Building 5, Room 205.
>> Bethesda, MD 20892. USA.
>> Phone: 1-301-496-1592
>> Fax: 1-301-496-9878
>>
>>
>>
>> On Mon, Apr 26, 2010 at 3:42 PM, Patrick Aboyoun<paboyoun at fhcrc.org>
>>  wrote:
>>
>>>
>>> Ivan,
>>> As you probably already realized, the first error you encountered was do
>>> to
>>> a misuse of the seqlengths function since function objects (e.g.
>>> BSgenome)
>>> have no sequence lengths.
>>>
>>> # Here is the problem
>>> seqlengths(Z)<- seqlengths(BSgenome)[names(seqlengths(Z))]
>>> Error in function (classes, fdef, mtable) :
>>> unable to find an inherited method for function "seqlengths", for
>>> signature "function"
>>>
>>> The second error is the result of a failed validity check on the modified
>>> object. All ranges stored in a GRanges object must be between 1 and
>>> seqlengths(object) if the seqlengths information is non-NAs.
>>>
>>> # Here is a second attempt that also fails
>>> seqlengths(Z)<- seqlengths(Mmusculus)[names(seqlengths(Z))]
>>> Error in validObject(.Object) :
>>> invalid class "GRanges" object: slot 'ranges' contains values
>>> outside of sequence bounds
>>>
>>> Compare the results of range(Z), which returns the min start and max end
>>> for
>>> each of the seqnames in Z, and compare it with
>>> seqlengths(Mmusculus)[names(seqlengths(Z))]. This should provide you with
>>> some insight as to which ranges are out of bounds. Perhaps your intervals
>>> are 0-based instead of 1-based?
>>>
>>>
>>> Cheers,
>>> Patrick
>>>
>>>
>>> On 4/26/10 11:08 AM, Ivan Gregoretti wrote:
>>>
>>>>
>>>> Hello listers,
>>>>
>>>> Is anybody having trouble assigning seqlengths() to a GRanges instance
>>>> with the new GenomicRanges version?
>>>>
>>>>
>>>> This morning I upgraded my GenomicRanges from 0.0.9 to 0.1.17 and
>>>> since then I am unable to assign chromosome lengths to any of my tag
>>>> sets from my Illumina 36 nucleotide sequences.
>>>>
>>>> On Friday this worked. Let me show you how it complains now:
>>>>
>>>> Z<- import('millionsoftags.bed.gz', 'bed')
>>>>
>>>> Z<- as(Z, 'GRanges')
>>>>
>>>> class(Z)
>>>> [1] "GRanges"
>>>> attr(,"package")
>>>> [1] "GenomicRanges"
>>>>
>>>> Z
>>>> GRanges with 23293177 ranges and 2 elementMetadata values
>>>>               seqnames               ranges strand   |
>>>>                  <Rle>               <IRanges>     <Rle>      |
>>>>        [1]        chr1   [3000506, 3000541]      +   |
>>>>        [2]        chr1   [3001061, 3001096]      -   |
>>>>        [3]        chr1   [3001075, 3001110]      -   |
>>>>        [4]        chr1   [3001098, 3001133]      +   |
>>>>        [5]        chr1   [3001310, 3001345]      +   |
>>>>        [6]        chr1   [3001559, 3001594]      +   |
>>>>        [7]        chr1   [3001603, 3001638]      +   |
>>>>        [8]        chr1   [3001603, 3001638]      +   |
>>>>        [9]        chr1   [3001609, 3001644]      -   |
>>>>        ...         ...                  ...    ... ...
>>>> [23293169] chrY_random [58402685, 58402720]      +   |
>>>> [23293170] chrY_random [58403358, 58403393]      +   |
>>>> [23293171] chrY_random [58406154, 58406189]      +   |
>>>> [23293172] chrY_random [58411077, 58411112]      -   |
>>>> [23293173] chrY_random [58430677, 58430712]      +   |
>>>> [23293174] chrY_random [58435117, 58435152]      -   |
>>>> [23293175] chrY_random [58472079, 58472114]      +   |
>>>> [23293176] chrY_random [58483725, 58483760]      -   |
>>>> [23293177] chrY_random [58487952, 58487987]      -   |
>>>>                                   name     score
>>>>                            <character>    <numeric>
>>>>        [1]  HWI-EAS179_1:7:39:506:1302        96
>>>>        [2]  HWI-EAS179_1:2:69:562:1539       119
>>>>        [3]  HWI-EAS179_1:8:28:1327:394       119
>>>>        [4]   HWI-EAS179_1:7:96:619:454       119
>>>>        [5] HWI-EAS179_49:3:4:1219:1729       119
>>>>        [6]  HWI-EAS179_49:3:88:949:558       118
>>>>        [7] HWI-EAS179_1:7:60:1151:1790       119
>>>>        [8]  HWI-EAS179_1:7:61:1586:147       114
>>>>        [9]   HWI-EAS179_1:7:55:813:365       106
>>>>        ...                         ...       ...
>>>> [23293169] HWI-EAS179_1:7:49:1416:1573        17
>>>> [23293170]  HWI-EAS179_1:8:25:405:1723        59
>>>> [23293171] HWI-EAS179_1:7:75:1366:1224        25
>>>> [23293172]    HWI-EAS179_1:2:5:1338:80         5
>>>> [23293173]  HWI-EAS179_49:3:13:151:166        83
>>>> [23293174] HWI-EAS179_49:3:29:1091:472         6
>>>> [23293175]  HWI-EAS179_1:2:69:1424:733        17
>>>> [23293176]  HWI-EAS179_1:7:16:945:1051        25
>>>> [23293177] HWI-EAS179_1:7:74:1117:1801        14
>>>>
>>>> seqlengths
>>>>          chr1        chr10        chr11 ...         chrY  chrY_random
>>>>            NA           NA           NA ...           NA           NA
>>>>
>>>> # Here is the problem
>>>> seqlengths(Z)<- seqlengths(BSgenome)[names(seqlengths(Z))]
>>>> Error in function (classes, fdef, mtable)  :
>>>>   unable to find an inherited method for function "seqlengths", for
>>>> signature "function"
>>>>
>>>> # Here is a second attempt that also fails
>>>> seqlengths(Z)<- seqlengths(Mmusculus)[names(seqlengths(Z))]
>>>> Error in validObject(.Object) :
>>>>   invalid class "GRanges" object: slot 'ranges' contains values
>>>> outside of sequence bounds
>>>>
>>>>
>>>> As you can see, I haven't had the chance to mess the data.
>>>> Any idea how to circumvent this problem?
>>>>
>>>> Thank you,
>>>>
>>>> Ivan
>>>>
>>>>
>>>> sessionInfo()
>>>> R version 2.12.0 Under development (unstable) (2010-03-25 r51410)
>>>> x86_64-unknown-linux-gnu
>>>>
>>>> locale:
>>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> other attached packages:
>>>> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.16 BSgenome_1.15.21
>>>> [3] Biostrings_2.15.27                 GenomicRanges_0.1.17
>>>> [5] IRanges_1.5.79                     rtracklayer_1.7.12
>>>> [7] RCurl_1.4-1                        bitops_1.0-4.1
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] Biobase_2.7.6 tools_2.12.0  XML_2.8-1
>>>>
>>>>
>>>> Ivan Gregoretti, PhD
>>>> National Institute of Diabetes and Digestive and Kidney Diseases
>>>> National Institutes of Health
>>>>
>>>> _______________________________________________
>>>> Bioc-sig-sequencing mailing list
>>>> Bioc-sig-sequencing at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>
>>>>
>>>
>>>
>
>



More information about the Bioc-sig-sequencing mailing list