[Bioc-sig-seq] GRanges, failure assigning chromosome lengths

Patrick Aboyoun paboyoun at fhcrc.org
Mon Apr 26 22:35:45 CEST 2010


Ivan,
Could you provide me with the results of

range(Z)

for all three of the GRanges objects that seqlengths<- throws and error 
on? I would like to see if there is some adjustment we can make to the 
seqlengths<- function that will resolve your issue.

Thanks,
Patrick


On 4/26/10 12:57 PM, Ivan Gregoretti wrote:
> Hi Patrick,
>
> You are correct. The validity check is rejecting the input. The
> problem with that is that I have three set, all of them failing perhap
> because one or two tags are out of bounds.
>
> This tags are coming straight from the Illumina sequencer.  There is
> no manipulation. Perhaps it is complaining because the validity check
> does not know that Mitochondrial DNA is circular.
>
> Bottom line, anybody attempting to load the data as GRanges from a
> high coverage tag set will find this seqlengths() rejection.
>
> Is there any way to override it or should I just refrain from
> assigning chromosome lengths? The online vignette does not mention if
> there is a switch to the validity check.
>
> Thank you,
>
> Ivan
>
>
>
> Ivan Gregoretti, PhD
> National Institute of Diabetes and Digestive and Kidney Diseases
> National Institutes of Health
> 5 Memorial Dr, Building 5, Room 205.
> Bethesda, MD 20892. USA.
> Phone: 1-301-496-1592
> Fax: 1-301-496-9878
>
>
>
> On Mon, Apr 26, 2010 at 3:42 PM, Patrick Aboyoun<paboyoun at fhcrc.org>  wrote:
>    
>> Ivan,
>> As you probably already realized, the first error you encountered was do to
>> a misuse of the seqlengths function since function objects (e.g. BSgenome)
>> have no sequence lengths.
>>
>> # Here is the problem
>> seqlengths(Z)<- seqlengths(BSgenome)[names(seqlengths(Z))]
>> Error in function (classes, fdef, mtable) :
>> unable to find an inherited method for function "seqlengths", for
>> signature "function"
>>
>> The second error is the result of a failed validity check on the modified
>> object. All ranges stored in a GRanges object must be between 1 and
>> seqlengths(object) if the seqlengths information is non-NAs.
>>
>> # Here is a second attempt that also fails
>> seqlengths(Z)<- seqlengths(Mmusculus)[names(seqlengths(Z))]
>> Error in validObject(.Object) :
>> invalid class "GRanges" object: slot 'ranges' contains values
>> outside of sequence bounds
>>
>> Compare the results of range(Z), which returns the min start and max end for
>> each of the seqnames in Z, and compare it with
>> seqlengths(Mmusculus)[names(seqlengths(Z))]. This should provide you with
>> some insight as to which ranges are out of bounds. Perhaps your intervals
>> are 0-based instead of 1-based?
>>
>>
>> Cheers,
>> Patrick
>>
>>
>> On 4/26/10 11:08 AM, Ivan Gregoretti wrote:
>>      
>>> Hello listers,
>>>
>>> Is anybody having trouble assigning seqlengths() to a GRanges instance
>>> with the new GenomicRanges version?
>>>
>>>
>>> This morning I upgraded my GenomicRanges from 0.0.9 to 0.1.17 and
>>> since then I am unable to assign chromosome lengths to any of my tag
>>> sets from my Illumina 36 nucleotide sequences.
>>>
>>> On Friday this worked. Let me show you how it complains now:
>>>
>>> Z<- import('millionsoftags.bed.gz', 'bed')
>>>
>>> Z<- as(Z, 'GRanges')
>>>
>>> class(Z)
>>> [1] "GRanges"
>>> attr(,"package")
>>> [1] "GenomicRanges"
>>>
>>> Z
>>> GRanges with 23293177 ranges and 2 elementMetadata values
>>>                seqnames               ranges strand   |
>>>                   <Rle>               <IRanges>     <Rle>      |
>>>         [1]        chr1   [3000506, 3000541]      +   |
>>>         [2]        chr1   [3001061, 3001096]      -   |
>>>         [3]        chr1   [3001075, 3001110]      -   |
>>>         [4]        chr1   [3001098, 3001133]      +   |
>>>         [5]        chr1   [3001310, 3001345]      +   |
>>>         [6]        chr1   [3001559, 3001594]      +   |
>>>         [7]        chr1   [3001603, 3001638]      +   |
>>>         [8]        chr1   [3001603, 3001638]      +   |
>>>         [9]        chr1   [3001609, 3001644]      -   |
>>>         ...         ...                  ...    ... ...
>>> [23293169] chrY_random [58402685, 58402720]      +   |
>>> [23293170] chrY_random [58403358, 58403393]      +   |
>>> [23293171] chrY_random [58406154, 58406189]      +   |
>>> [23293172] chrY_random [58411077, 58411112]      -   |
>>> [23293173] chrY_random [58430677, 58430712]      +   |
>>> [23293174] chrY_random [58435117, 58435152]      -   |
>>> [23293175] chrY_random [58472079, 58472114]      +   |
>>> [23293176] chrY_random [58483725, 58483760]      -   |
>>> [23293177] chrY_random [58487952, 58487987]      -   |
>>>                                    name     score
>>>                             <character>    <numeric>
>>>         [1]  HWI-EAS179_1:7:39:506:1302        96
>>>         [2]  HWI-EAS179_1:2:69:562:1539       119
>>>         [3]  HWI-EAS179_1:8:28:1327:394       119
>>>         [4]   HWI-EAS179_1:7:96:619:454       119
>>>         [5] HWI-EAS179_49:3:4:1219:1729       119
>>>         [6]  HWI-EAS179_49:3:88:949:558       118
>>>         [7] HWI-EAS179_1:7:60:1151:1790       119
>>>         [8]  HWI-EAS179_1:7:61:1586:147       114
>>>         [9]   HWI-EAS179_1:7:55:813:365       106
>>>         ...                         ...       ...
>>> [23293169] HWI-EAS179_1:7:49:1416:1573        17
>>> [23293170]  HWI-EAS179_1:8:25:405:1723        59
>>> [23293171] HWI-EAS179_1:7:75:1366:1224        25
>>> [23293172]    HWI-EAS179_1:2:5:1338:80         5
>>> [23293173]  HWI-EAS179_49:3:13:151:166        83
>>> [23293174] HWI-EAS179_49:3:29:1091:472         6
>>> [23293175]  HWI-EAS179_1:2:69:1424:733        17
>>> [23293176]  HWI-EAS179_1:7:16:945:1051        25
>>> [23293177] HWI-EAS179_1:7:74:1117:1801        14
>>>
>>> seqlengths
>>>           chr1        chr10        chr11 ...         chrY  chrY_random
>>>             NA           NA           NA ...           NA           NA
>>>
>>> # Here is the problem
>>> seqlengths(Z)<- seqlengths(BSgenome)[names(seqlengths(Z))]
>>> Error in function (classes, fdef, mtable)  :
>>>    unable to find an inherited method for function "seqlengths", for
>>> signature "function"
>>>
>>> # Here is a second attempt that also fails
>>> seqlengths(Z)<- seqlengths(Mmusculus)[names(seqlengths(Z))]
>>> Error in validObject(.Object) :
>>>    invalid class "GRanges" object: slot 'ranges' contains values
>>> outside of sequence bounds
>>>
>>>
>>> As you can see, I haven't had the chance to mess the data.
>>> Any idea how to circumvent this problem?
>>>
>>> Thank you,
>>>
>>> Ivan
>>>
>>>
>>> sessionInfo()
>>> R version 2.12.0 Under development (unstable) (2010-03-25 r51410)
>>> x86_64-unknown-linux-gnu
>>>
>>> locale:
>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>   [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.16 BSgenome_1.15.21
>>> [3] Biostrings_2.15.27                 GenomicRanges_0.1.17
>>> [5] IRanges_1.5.79                     rtracklayer_1.7.12
>>> [7] RCurl_1.4-1                        bitops_1.0-4.1
>>>
>>> loaded via a namespace (and not attached):
>>> [1] Biobase_2.7.6 tools_2.12.0  XML_2.8-1
>>>
>>>
>>> Ivan Gregoretti, PhD
>>> National Institute of Diabetes and Digestive and Kidney Diseases
>>> National Institutes of Health
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>
>>>        
>>
>>



More information about the Bioc-sig-sequencing mailing list