[Bioc-sig-seq] readAligned and coverage methods of the short read package
Sirisha Sunkara
SSunkara at lbl.gov
Thu Aug 27 04:02:47 CEST 2009
Hi Martin,
This is great, exactly what I was asking for - Thanks a lot!
Sorry, I didn't realize how frequently the changes are being made at-least to the ShortRead Package. In my installation, the coverage function still seems to return the "GenomeData" object.
Is the svn Changelog ( http://fgc.lsi.umich.edu/cgi-bin/blosxom.cgi) the best place to look up all the updates?
Thanks Again.
> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-08-19 r49321)
x86_64-unknown-linux-gnu
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ShortRead_1.3.25 lattice_0.17-25 BSgenome_1.13.10 Biostrings_2.13.32
[5] IRanges_1.3.56
loaded via a namespace (and not attached):
[1] Biobase_2.5.5 grid_2.10.0 hwriter_1.1 tools_2.10.0
--sirisha
----- Original Message -----
From: Martin Morgan <mtmorgan at fhcrc.org>
Date: Monday, August 24, 2009 10:56 pm
Subject: Re: [Bioc-sig-seq] readAligned and coverage methods of the short read package
To: SSunkara at lbl.gov
Cc: bioc-sig-sequencing at r-project.org
> Hi Sirisha --
>
> It's important to mention sessionInfo(). Until recently coverage()
> returned a 'GenomeData' object, more recently it returns a
> SimpleRleList, which simplifies working with other parts of the
> Bioconductor tools.
>
> For my setup (based on the development version of R and current
> Bioconductor packages) I can do
>
> library(ShortRead)
> example(readAligned) # sets up next line, which is the last example
> aln = readAligned(sp, "s_2_export.txt", filter=filt)
> cvg = coverage(aln)
>
> I then have
>
> > cvg
> SimpleRleList: 5 elements
> names(5): chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa
> > cvg[["chr1.fa"]]
> 'integer' Rle instance of length 189122529 with 44 runs
> Lengths: 3393024 35 9077058 35 12148400 35 1576458 35 52888550
> 35 ...
> Values : 0 1 0 1 0 1 0 1 0 1 ...
>
> You can think of 'coverage' like a list, with one element for each
> 'chromosome'. The elements of the list are 'Rle' objects, meaning that
> they are a run-length encoded representation of coverage. So above
> thereare 3393024 0's (i.e., uncovered nucleotides), followed by 35
> 1's (i.e.,
> nucleotides covered by one read), and so on. The Rle ends at the
> end of
> the last read, or if you'd provided coverage with an appropriate 'end'
> argument (see ?'coverage,AlignedRead-method') an arbitrarily
> determinednucleotide, e.g., the last nucleotide on the chromosome,
> so padding with
> 0 coverage from the end of the last read to the end of the chromosome.
>
> GenomeData has a very similar structure.
>
> Some more below...
>
> Sirisha Sunkara wrote:
> > Hello All,
> >
> > I am looking at read coverage bias in a few samples (some libraries
> > constructed under different PCR amplification cycles), and would
> like to
> > look at the coverage plots as % read count distribution/lane
> over the
> > chromosome/reference coordinates.
>
> Because SimpleRleList is a list, you can do something like
>
> lapply(cvg, sum)
>
> to get the total coverage on a per-chromosome basis, and
>
> laneCnt = sum(unlist(lapply(cvg, sum)))
>
> to get the total coverage per lane (assuming you'd read only one lane
> into aln with readAligned; this will often be a good strategy anyway,
> because the aligned reads are very large but coverage is trivially
> small). You can normalize by lane with something along the lines of
>
> cvg1 = cvg / laneCnt
>
> (if normalizing by lane is appropriate...). In R 2.9.1 the operations
> are quite similar, e.g., lapply(cvg, "/", laneCnt) gives a simple list
> of coverage Rle's.
>
> > Right now reading the export.txt files with the readAligned and the
> > coverage methods, I get the raw counts. Is there something that
> does the
> > conversion to % over the total read count/lane before plotting?
> >
> > Sorry, if I am missing something obvious..
> >
> > Also, I was wondering if the coverage method retains information of
> > regions with "zero" coverage? This would be particularly useful when
> > comparing sequencing bias in different samples and/or different
> library> construction methods.
>
> yes, especially if you provide the 'end' argument to coverage.
> These are
> the zeros in the Rle. You can do fun things like
>
> sum(cvg[[1]] == 0)
>
> to find out the total number of uncovered nucleotides, or
>
> slice(cvg, upper=0)
>
> to get a 'view' of all uncovered nucleotides.
>
> I'm not sure if that's what you're asking, or whether you had
> somethingelse in mind?
>
> Martin
>
> > sessionInfo()
> R version 2.10.0 Under development (unstable) (2009-08-21 r49350)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] ShortRead_1.3.27 lattice_0.17-25 BSgenome_1.13.10
> Biostrings_2.13.33
> [5] IRanges_1.3.59
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.5.5 grid_2.10.0 hwriter_1.1
>
>
> >
> > Thanks a lot for any pointers,
> > Sirisha
> >
> > _______________________________________________
> > Bioc-sig-sequencing mailing list
> > Bioc-sig-sequencing at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
>
More information about the Bioc-sig-sequencing
mailing list