[Bioc-sig-seq] A bug in the AlignedRead coverage method?

Nicolas Delhomme delhomme at embl.de
Thu May 28 17:43:27 CEST 2009


Hi Martin,

Thanks, it works nicely. However, when I added a width parameter, it  
kept on telling me

Error: UserArgumentMismatch
   'names(width)' (or 'names(end)') mismatch with  
'levels(chromosome(x))'
   see ?"AlignedRead-class"

whereas my vector has the proper names.

I dig up and the problem comes from the fact that my vector contains a  
numeric value and not an integer.

In the coverage function, you convert numeric width values into  
integer ones and this removes the names attrs from the vector. (see  
below for an example)

 > widths=c(21146708)
 > names(widths)<-"2R"
 > str(widths)
  Named num 21146708
  - attr(*, "names")= chr "2R"
 > as.integer(widths)
[1] 21146708

Adding these two lines fixes it (around l.191 0f methods-AlignedRead.R):

	# save the names (added line)
         w.names<-names(width)

	# convert to integer (this was l.191)
             width <- as.integer(width)

	# restore the names (added line)
         names(width)<-w.names

Hope this helps,

---------------------------------------------------------------
Nicolas Delhomme

High Throughput Functional Genomics Center

European Molecular Biology Laboratory

Tel: +49 6221 387 8426
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------



On 28 May 2009, at 05:47, Martin Morgan wrote:

> Hi Nicolas --
>
> Thanks for your report; S4 is I think evaluating a function argument
> in the wrong frame.
>
> In the RELEASE version of ShortRead, v. 1.2.1 available in svn now and
> via biocLite in the next 48 hours, the idea is that
>
>  coverage(aln)
>
> trims leading and trailing nucleotides with 0 coverage, so the
> coordinates of the Rle are relative to the first covered nucleotide
> (in v. 1.2.0 and devel <1.3.8 the tail of the Rle included a run of
> unintended 0 coverage). Genomic coordinates, suitable for bed files
> etc., are obtained with
>
>  coverage(aln, start=1L)
>
> In the DEVEL version of ShortRead, v. 1.3.9, the default behavior and
> preferred way of specifying coordinates has been changed to be
> consistent with recent changes in IRanges and Biostrings.
>
>  coverage(aln)
>
> returns genomic (1-based) coordinates, and the prefered way to adjust
> the coordinate system is with shift/width arguments (which when
> present are named integer vectors, with all levels(chromosome(aln))
> present). See ?"AlignedRead-class" and its example for additional
> detail.
>
> Thanks again for the report. Please let me know if I've
> mis-understood.
>
> Martin
>
> Nicolas Delhomme <delhomme at embl.de> writes:
>
>> Actually, there's no voodoo, "shift" is defined in the function call
>> to 0L, so no surprise that:
>>
>> coverage(IRanges(rstart, rend), shift=1L-start, width=end+shift, ...)
>>
>> gives the same result as
>>
>> coverage(IRanges(rstart, rend), shift=1L-start, width=end, ...`)
>>
>> And I think that my correction is not what is expected from this
>> function, right? Because "my way" you actually lose the  "offset"
>> (these almost 6Mb in my case). So, what you are doing is  actually
>> storing the offset as the last element of the Rle object,  right? Is
>> that on purpose? That would mean that if I want to plot the  coverage
>> properly for my gene (using the genomic coordinate, i.e.  using a bed
>> or wig file), I would have to move this last element as  the
>> first. Wouldn't it be easier to have an "offset" parameter for the
>> coverage function which being TRUE would put the offset at the right
>> place (i.e, the first element of the Rle) and FALSE would simply not
>> add it?
>>
>> Best,
>>
>>
>> On 27 May 2009, at 16:40, Nicolas Delhomme wrote:
>>
>>> Hi all,
>>>
>>> (Martin, I guess this one is for you :-))
>>>
>>> I have a subset of an AlignedRead
>>>
>>> class: AlignedRead
>>> length: 1323 reads; width: 36 cycles
>>> chromosome: 2R 2R ... 2R 2R
>>> position: 5866890 5867511 ... 5867007 5867434
>>> strand: - + ... + -
>>> alignQuality: NumericQuality
>>> alignData varLabels: mismatch
>>>
>>> When I call the coverage function, I get back a GenomeData obj.  
>>> Fine.
>>>
>>> Formal class 'GenomeData' [package "BSgenome"] with 4 slots
>>> ..@ listData       :List of 1
>>> .. ..$ 2R:Formal class 'Rle' [package "IRanges"] with 5 slots
>>> .. .. .. ..@ values         : int [1:888] 1 2 5 9 12 14 17 20 21
>>> 22 ...
>>> .. .. .. ..@ lengths        : int [1:888] 7 1 2 1 1 2 1 3 2 2 ...
>>> .. .. .. ..@ elementMetadata: NULL
>>> .. .. .. ..@ elementType    : chr "ANYTHING"
>>> .. .. .. ..@ metadata       : list()
>>> ..@ elementMetadata: NULL
>>> ..@ elementType    : chr "ANYTHING"
>>> ..@ metadata       :List of 6
>>> .. ..$ organism       : NULL
>>> .. ..$ provider       : NULL
>>> .. ..$ providerVersion: NULL
>>> .. ..$ method         : chr "coverage,AlignedRead-method"
>>> .. ..$ coords         : chr "leftmost"
>>> .. ..$ extend         : int 0
>>>
>>> What is surprising is when I look at the Rle object I got:
>>>
>>> 'integer' Rle instance of length 5868503 with 888 runs
>>> Lengths:  7 1 2 1 1 2 1 3 2 2 ...
>>> Values :  1 2 5 9 12 14 17 20 21 22 ...
>>>
>>> which tells me that I have a "covered" region of 5868503 bp instead
>>> of the 1675 bp I'm expecting.
>>>
>>> This comes from the last set of length/value from the Rle object:
>>>
>>> runLength(test$`2R`)[886:888]
>>> [1]     179      36 5866828
>>>
>>> runValue(test$`2R`)[886:888]
>>> [1] 0 1 0
>>>
>>> I could track it down to the AlignedRead coverage method (l.179 of
>>> methods-AlignedRead.R in svn rev. 39735), where the code is:
>>>
>>> coverage(IRanges(rstart, rend), shift=1L-start, width=end- 
>>> shift, ...)
>>>
>>> As shift is negative, the line should be changed to:
>>>
>>> coverage(IRanges(rstart, rend), shift=1L-start, width=end 
>>> +shift, ...)
>>>
>>> But this should not work, as "shift" is not defined beforehand. I
>>> would have expected it to crash, but apparently it is simply is
>>> ignored.... (there's some R voodoo going on there...) and simply
>>> results in this call:
>>>
>>> coverage(IRanges(rstart, rend), shift=1L-start, width=end, ...)
>>>
>>> Unexpected, but anyway, this corrects it:
>>>
>>> coverage(IRanges(rstart, rend), shift=1L-start, width=end+1L-
>>> start, ...)
>>>
>>> and gives the same results as
>>>
>>> coverage(IRanges(rstart, rend), start=start, end=end, ...)
>>>
>>> 'integer' Rle instance of length 1675 with 887 runs
>>> Lengths:  7 1 2 1 1 2 1 3 2 2 ...
>>> Values :  1 2 5 9 12 14 17 20 21 22 ...
>>>
>>> Best,
>>>
>>> ---------------------------------------------------------------
>>> Nicolas Delhomme
>>>
>>> High Throughput Functional Genomics Center
>>>
>>> European Molecular Biology Laboratory
>>>
>>> Tel: +49 6221 387 8426
>>> Email: nicolas.delhomme at embl.de
>>> Meyerhofstrasse 1 - Postfach 10.2209
>>> 69102 Heidelberg, Germany
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>> ---------------------------------------------------------------
>> Nicolas Delhomme
>>
>> High Throughput Functional Genomics Center
>>
>> European Molecular Biology Laboratory
>>
>> Tel: +49 6221 387 8426
>> Email: nicolas.delhomme at embl.de
>> Meyerhofstrasse 1 - Postfach 10.2209
>> 69102 Heidelberg, Germany
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
> -- 
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list