[Bioc-sig-seq] gdapply, coverage, and widths for all chromosomes

Chris Seidel seidel at phaget4.org
Mon Oct 19 23:02:44 CEST 2009


Thanks for the responses.

On Mon, Oct 19, 2009, Deepayan wrote: 
> for most downstream operations the default choice of 'width' 
> seemed OK.Could you explain why that is not sufficient in your case?

I didn't know there was a default choice. I thought that without a
'width' argument, coverage is only calculated between 0 and the
right-most read (i.e. highest integer position).

For instance, I create a GenomeData object from a set of aligned reads:

> str(aln.gd$chr2L)
List of 2
 $ -: int [1:538743] 8223524 17606696 [...] 2021681 21292267 ...
 $ +: int [1:542167] 22057110 2479626 [...] 17195586 21698792 ...

handing this to extendReads(), will extend those positions, but no where
is it known that this is drosophila data or that chr2L actually has
23,011,544 bases. So handing the extended reads to coverage(), would
only calculate coverage between 0 and the highest extended read (or so I
thought).

I want to extend the reads to the ends of the chromosomes, because what
I actually want to do is to calculate coverage for an IP sample, and a
Whole Cell Extract sample, and then divide the two (perhaps after adding
1 to each position to avoid dividing by zero) to create a ratio - and
then export to WIG.

e.g. where ypd is a GenomeDataList object:

ip1 <- extendReads(ypd$ip$chrI, seqLen=200)
wce1 <- extendReads(ypd$wce$chrI, seqLen=200)

ip1.cov <- coverage(ip1, width=230208)
wce1.cov <- coverage(wce1, width=230208)

ip1.cov <- ip1.cov + 1
wce1.cov <- wce1.cov + 1
rat <- log2(ip1.cov/wce1.cov)

export(rat, "ip1_wce1_rat.wig")

(not perfect, but goes a long way to exploring the data.)

The ratios represent signal over background, and one can investigate
normalization, smoothing, peaks, etc. I've figured out how to do all
these things with single chromosomes, but as Michael has pointed out,
it's much more natural (and elegant) to do this for all the chromosomes
at once.

-Chris

Deepayan wrote: 
>  On Mon, Oct 19, 2009 at 10:25 AM, Chris Seidel <seidel at
phaget4.org> wrote:
> > In using gdapply to get coverage over all chromosomes of a
> > GenomeDataList object:
> >
> > cov <- gdapply(extendedReads, coverage)
> >
> > how do you supply the "width" argument to coverage? When I try handing
> > it a vector of chromosome lengths, it complains that it needs a single
> > value.
> 
> You cannot. That's a limitation of gdapply, potentially fixable by
> making the chromosome lengths available as phenodata associated with
> the "GenomeDataList" object.
> 
> One reason for not addressing this limitation yet is that for most
> downstream operations the default choice of 'width' seemed OK. Could
> you explain why that is not sufficient in your case?
>
> -Deepayan



More information about the Bioc-sig-sequencing mailing list