[Bioc-sig-seq] A bug in the AlignedRead coverage method?

Nicolas Delhomme delhomme at embl.de
Wed May 27 16:40:14 CEST 2009


Hi all,

(Martin, I guess this one is for you :-))

I have a subset of an AlignedRead

class: AlignedRead
length: 1323 reads; width: 36 cycles
chromosome: 2R 2R ... 2R 2R
position: 5866890 5867511 ... 5867007 5867434
strand: - + ... + -
alignQuality: NumericQuality
alignData varLabels: mismatch

When I call the coverage function, I get back a GenomeData obj. Fine.

Formal class 'GenomeData' [package "BSgenome"] with 4 slots
   ..@ listData       :List of 1
   .. ..$ 2R:Formal class 'Rle' [package "IRanges"] with 5 slots
   .. .. .. ..@ values         : int [1:888] 1 2 5 9 12 14 17 20 21  
22 ...
   .. .. .. ..@ lengths        : int [1:888] 7 1 2 1 1 2 1 3 2 2 ...
   .. .. .. ..@ elementMetadata: NULL
   .. .. .. ..@ elementType    : chr "ANYTHING"
   .. .. .. ..@ metadata       : list()
   ..@ elementMetadata: NULL
   ..@ elementType    : chr "ANYTHING"
   ..@ metadata       :List of 6
   .. ..$ organism       : NULL
   .. ..$ provider       : NULL
   .. ..$ providerVersion: NULL
   .. ..$ method         : chr "coverage,AlignedRead-method"
   .. ..$ coords         : chr "leftmost"
   .. ..$ extend         : int 0

What is surprising is when I look at the Rle object I got:

  'integer' Rle instance of length 5868503 with 888 runs
   Lengths:  7 1 2 1 1 2 1 3 2 2 ...
   Values :  1 2 5 9 12 14 17 20 21 22 ...

which tells me that I have a "covered" region of 5868503 bp instead of  
the 1675 bp I'm expecting.

This comes from the last set of length/value from the Rle object:

runLength(test$`2R`)[886:888]
[1]     179      36 5866828

runValue(test$`2R`)[886:888]
[1] 0 1 0

I could track it down to the AlignedRead coverage method (l.179 of  
methods-AlignedRead.R in svn rev. 39735), where the code is:

coverage(IRanges(rstart, rend), shift=1L-start, width=end-shift, ...)

As shift is negative, the line should be changed to:

coverage(IRanges(rstart, rend), shift=1L-start, width=end+shift, ...)

But this should not work, as "shift" is not defined beforehand. I  
would have expected it to crash, but apparently it is simply is  
ignored.... (there's some R voodoo going on there...) and simply  
results in this call:

coverage(IRanges(rstart, rend), shift=1L-start, width=end, ...)

Unexpected, but anyway, this corrects it:

coverage(IRanges(rstart, rend), shift=1L-start, width=end+1L-start, ...)

and gives the same results as

coverage(IRanges(rstart, rend), start=start, end=end, ...)

'integer' Rle instance of length 1675 with 887 runs
   Lengths:  7 1 2 1 1 2 1 3 2 2 ...
   Values :  1 2 5 9 12 14 17 20 21 22 ...

Best,

---------------------------------------------------------------
Nicolas Delhomme

High Throughput Functional Genomics Center

European Molecular Biology Laboratory

Tel: +49 6221 387 8426
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany



More information about the Bioc-sig-sequencing mailing list