[BioC] Coverage vs GC bias - how to?

Marc Noguera mnoguera at imppc.org
Tue Sep 7 10:59:35 CEST 2010


Hi all,

I am trying to get some insight knowledge on some NGS data. I have an
alignment file and an AlignedRead object from it, from which I can
compute coverage.
What I would like to know is if there is any bias on the coverage vs GC
content.
My idea is to run a sliding window of width N through the reference
sequence I have the reads on and to compute, somehow, a kind of coverage
coefficient from the coverage i get from IRanges and the AlignedRead object.

As I know the coordinate of the sliding window the next step is to
obtain, somehow, the sequence of the window and calculate the GC content.

How could I extract the sequence from BSgenome within this sliding
window and how can I create the sliding window. I see that I can use
getSeq to obtain the sequence and the use alphabetByFrequency on it.
However, I don't know how to create this sliding window.
Which is the best way to proceed? am I reinventing the wheel? 

Thanks in advance

Marc

-- 
-----------------------------------------------------
Marc Noguera i Julian, PhD
Genomics unit / Bioinformatics
Institut de Medicina Predictiva i Personalitzada
del Càncer (IMPPC)
B-10 Office
Carretera de Can Ruti
Camí de les Escoles s/n
08916 Badalona, Barcelona



More information about the Bioconductor mailing list