[Bioc-sig-seq] readAligned and coverage methods of the short read package

Martin Morgan mtmorgan at fhcrc.org
Thu Aug 27 04:30:09 CEST 2009


Sirisha Sunkara wrote:
> Hi Martin,
> 
> This is great, exactly what I was asking for - Thanks a lot!
> 
> Sorry, I didn't realize how frequently the changes are being made at-least to the ShortRead Package. In my installation, the coverage function still seems to return the "GenomeData" object.
> 
> Is the svn Changelog ( http://fgc.lsi.umich.edu/cgi-bin/blosxom.cgi) the best place to look up all the updates?

Hi Sirisha --

That's a very accessible location; the package also has a NEWS file that
is updated fairly consistently when there are major changes. Try

  noquote(readLines(system.file("NEWS", package="ShortRead")))

(also for Biostrings). The svn log is also available directly, with an
appropriate client

  svn log -v
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/ShortRead

Martin

> 
> Thanks Again.
> 
>> sessionInfo()
> R version 2.10.0 Under development (unstable) (2009-08-19 r49321)
> x86_64-unknown-linux-gnu
> 
> locale:
> [1] C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] ShortRead_1.3.25   lattice_0.17-25    BSgenome_1.13.10   Biostrings_2.13.32
> [5] IRanges_1.3.56
> 
> loaded via a namespace (and not attached):
> [1] Biobase_2.5.5 grid_2.10.0   hwriter_1.1   tools_2.10.0
> 
> --sirisha
> 
> ----- Original Message -----
> From: Martin Morgan <mtmorgan at fhcrc.org>
> Date: Monday, August 24, 2009 10:56 pm
> Subject: Re: [Bioc-sig-seq] readAligned and coverage methods of the short read package
> To: SSunkara at lbl.gov
> Cc: bioc-sig-sequencing at r-project.org
> 
>> Hi Sirisha --
>>
>> It's important to mention sessionInfo(). Until recently coverage()
>> returned a 'GenomeData' object, more recently it returns a
>> SimpleRleList, which simplifies working with other parts of the
>> Bioconductor tools.
>>
>> For my setup (based on the development version of R and current
>> Bioconductor packages) I can do
>>
>> library(ShortRead)
>> example(readAligned) # sets up next line, which is the last example
>> aln = readAligned(sp, "s_2_export.txt", filter=filt)
>> cvg = coverage(aln)
>>
>> I then have
>>
>>> cvg
>> SimpleRleList: 5 elements
>> names(5): chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa
>>> cvg[["chr1.fa"]]
>>  'integer' Rle instance of length 189122529 with 44 runs
>>  Lengths:  3393024 35 9077058 35 12148400 35 1576458 35 52888550 
>> 35 ...
>>  Values :  0 1 0 1 0 1 0 1 0 1 ...
>>
>> You can think of 'coverage' like a list, with one element for each
>> 'chromosome'. The elements of the list are 'Rle' objects, meaning that
>> they are a run-length encoded representation of coverage. So above 
>> thereare 3393024 0's (i.e., uncovered nucleotides), followed by 35 
>> 1's (i.e.,
>> nucleotides covered by one read), and so on. The Rle ends at the 
>> end of
>> the last read, or if you'd provided coverage with an appropriate 'end'
>> argument (see ?'coverage,AlignedRead-method') an arbitrarily 
>> determinednucleotide, e.g., the last nucleotide on the chromosome, 
>> so padding with
>> 0 coverage from the end of the last read to the end of the chromosome.
>>
>> GenomeData has a very similar structure.
>>
>> Some more below...
>>
>> Sirisha Sunkara wrote:
>>> Hello All,
>>>
>>> I am looking at read coverage bias in a few samples (some libraries
>>> constructed under different PCR amplification cycles), and would 
>> like to
>>> look at the coverage plots as % read count distribution/lane  
>> over the 
>>> chromosome/reference coordinates.
>> Because SimpleRleList is a list, you can do something like
>>
>>  lapply(cvg, sum)
>>
>> to get the total coverage on a per-chromosome basis, and
>>
>>  laneCnt = sum(unlist(lapply(cvg, sum)))
>>
>> to get the total coverage per lane (assuming you'd read only one lane
>> into aln with readAligned; this will often be a good strategy anyway,
>> because the aligned reads are very large but coverage is trivially
>> small). You can normalize by lane with something along the lines of
>>
>>  cvg1 = cvg / laneCnt
>>
>> (if normalizing by lane is appropriate...). In R 2.9.1 the operations
>> are quite similar, e.g., lapply(cvg, "/", laneCnt) gives a simple list
>> of coverage Rle's.
>>
>>> Right now reading the export.txt files with the readAligned and the
>>> coverage methods, I get the raw counts. Is there something that 
>> does the
>>> conversion to % over the total read count/lane before plotting?
>>>
>>> Sorry, if I am missing something obvious..
>>>
>>> Also, I was wondering if the coverage method retains information of
>>> regions with "zero" coverage? This would be particularly useful when
>>> comparing sequencing bias in different samples and/or different 
>> library> construction methods.
>>
>> yes, especially if you provide the 'end' argument to coverage. 
>> These are
>> the zeros in the Rle. You can do fun things like
>>
>>  sum(cvg[[1]] == 0)
>>
>> to find out the total number of uncovered nucleotides, or
>>
>>  slice(cvg, upper=0)
>>
>> to get a 'view' of all uncovered nucleotides.
>>
>> I'm not sure if that's what you're asking, or whether you had 
>> somethingelse in mind?
>>
>> Martin
>>
>>> sessionInfo()
>> R version 2.10.0 Under development (unstable) (2009-08-21 r49350)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] ShortRead_1.3.27   lattice_0.17-25    BSgenome_1.13.10
>> Biostrings_2.13.33
>> [5] IRanges_1.3.59
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.5.5 grid_2.10.0   hwriter_1.1
>>
>>
>>> Thanks a lot for any pointers,
>>> Sirisha
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>



More information about the Bioc-sig-sequencing mailing list