[BioC] Coverage by base

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Oct 12 18:35:16 CEST 2011


Hi,

2011/10/12 Hervé Pagès <hpages at fhcrc.org>:
> Hi Rohan,
>
> On 11-10-11 11:49 AM, rohan bareja wrote:
>>
>> Hi,
>> Ya thats correct..I have summed up the
>>  reads using viewSums but I want to get the gene information .Hi,
>> I found this archive from bioconductor-sig-sequencing mailing list where
>> they have discussed about cov_by_gene<- Views(cov, genes)
>>
>> viewSums(cov_by_gene)https://stat.ethz.ch/pipermail/bioc-sig-sequencing/attachments/20110723/b9f2c69c/attachment.pl
>> But I am not sure what is the object "genes" that has been used along with
>> the coverage.Thats exactly what I would like to do.
>
> I think 'genes' is a GRanges object object that was obtained with
> something like (pseudo-code):
>
>  library(GenomicFeatures)
>  txdb <- makeTranscriptDbFromUCSC(...)
>  gene_ranges <- range(transcriptsBy(txdb, by="gene"))
>  genes <- unlist(gene_ranges)

I'm not so sure?

I can't find any Views(*) methods defined for GRanges objects, but
maybe I'm missing something.

If that's the case, I'd just go the manual route.

Rohan: let's assume you have a SimpleRleList named `cvr` which
represents the coverage vectors for each chromosome in your bam file,
something like:

R> cvr
SimpleRleList of length 25
$chr1
'integer' Rle of length 249250621 with 249520 runs
  Lengths: 14362     4    23     1     5 ...    25   974    26 12758
  Values :     0     4     5     6     5 ...     1     0     1     0

$chr2
'integer' Rle of length 243199373 with 179635 runs
  Lengths: 10103    31  3803    43 21102 ...    27  4543    23 19174
  Values :     0     1     0     1     0 ...     1     0     1     0

And say that you've done what Herve has suggested where you have a
GRanges object names `genes` that has all of your gene info:

R> genes
                  seqnames                 ranges strand   |
                     <Rle>              <IRanges>  <Rle>   |
        1.1          chr19 [ 58858172,  58864865]      -   |
       10.2           chr8 [ 18248755,  18258723]      +   |
      100.3          chr20 [ 43248163,  43280376]      -   |
     1000.4          chr18 [ 25530930,  25757445]      -   |
    10000.5           chr1 [243651535, 244006886]      -   |
100008586.6           chrX [ 49217771,  49342266]      +   |
...

I'd then loop over the `names(cvr)` and do your counting as you like, maybe:

R> info <- lapply(names(cvr), function(chr) {
  g <- genes[seqnames(genes) == chr]
  v <- Views(cvr[[chr]], ranges(g))
  viewSums(v)
})

I'd imagine you will want to do more book-keeping than what I'm
showing, but here's a start.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list