[BioC] Coverage by base

Hervé Pagès hpages at fhcrc.org
Wed Oct 12 19:53:58 CEST 2011


Hi Steve,

On 11-10-12 09:35 AM, Steve Lianoglou wrote:
> Hi,
>
> 2011/10/12 Hervé Pagès<hpages at fhcrc.org>:
>> Hi Rohan,
>>
>> On 11-10-11 11:49 AM, rohan bareja wrote:
>>>
>>> Hi,
>>> Ya thats correct..I have summed up the
>>>   reads using viewSums but I want to get the gene information .Hi,
>>> I found this archive from bioconductor-sig-sequencing mailing list where
>>> they have discussed about cov_by_gene<- Views(cov, genes)
>>>
>>> viewSums(cov_by_gene)https://stat.ethz.ch/pipermail/bioc-sig-sequencing/attachments/20110723/b9f2c69c/attachment.pl
>>> But I am not sure what is the object "genes" that has been used along with
>>> the coverage.Thats exactly what I would like to do.
>>
>> I think 'genes' is a GRanges object object that was obtained with
>> something like (pseudo-code):
>>
>>   library(GenomicFeatures)
>>   txdb<- makeTranscriptDbFromUCSC(...)
>>   gene_ranges<- range(transcriptsBy(txdb, by="gene"))
>>   genes<- unlist(gene_ranges)
>
> I'm not so sure?
>
> I can't find any Views(*) methods defined for GRanges objects, but
> maybe I'm missing something.

Right. It needs to be coerced to RangesList first:

 > genes
GRanges with 3 ranges and 0 elementMetadata values:
         seqnames    ranges strand
            <Rle> <IRanges>  <Rle>
   geneA     chr1    [3, 6]      *
   geneB     chr2    [4, 7]      *
   geneC     chr1    [5, 8]      *
   ---
   seqlengths:
    chr1 chr2
      NA   NA

 > genes_rgl <- as(genes, "RangesList")

 > genes_rgl
CompressedIRangesList of length 2
$chr1
IRanges of length 2
     start end width names
[1]     3   6     4 geneA
[2]     5   8     4 geneC

$chr2
IRanges of length 1
     start end width names
[1]     4   7     4 geneB

Then you can use Views() on your 'cvr' object below:

 > vl <- Views(cvr, genes_rgl)
 > vl
SimpleRleViewsList of length 2
names(2): chr1 chr2

viewSums() and the other summarization functions (see ?viewSums) work
on this SimpleRleViewsList object:

 > viewSums(vl)
SimpleIntegerList of length 2
[["chr1"]] 42 15
[["chr2"]] 18

and the names of the genes are still here:

 > viewSums(vl)[["chr1"]]
geneA geneC
    42    15

>
> If that's the case, I'd just go the manual route.
>
> Rohan: let's assume you have a SimpleRleList named `cvr` which
> represents the coverage vectors for each chromosome in your bam file,
> something like:
>
> R>  cvr
> SimpleRleList of length 25
> $chr1
> 'integer' Rle of length 249250621 with 249520 runs
>    Lengths: 14362     4    23     1     5 ...    25   974    26 12758
>    Values :     0     4     5     6     5 ...     1     0     1     0
>
> $chr2
> 'integer' Rle of length 243199373 with 179635 runs
>    Lengths: 10103    31  3803    43 21102 ...    27  4543    23 19174
>    Values :     0     1     0     1     0 ...     1     0     1     0
>
> And say that you've done what Herve has suggested where you have a
> GRanges object names `genes` that has all of your gene info:
>
> R>  genes
>                    seqnames                 ranges strand   |
>                       <Rle>               <IRanges>   <Rle>    |
>          1.1          chr19 [ 58858172,  58864865]      -   |
>         10.2           chr8 [ 18248755,  18258723]      +   |
>        100.3          chr20 [ 43248163,  43280376]      -   |
>       1000.4          chr18 [ 25530930,  25757445]      -   |
>      10000.5           chr1 [243651535, 244006886]      -   |
> 100008586.6           chrX [ 49217771,  49342266]      +   |
> ...

Forgot to say: get rid of those ugly/silly names with:

   genes <- unlist(gene_ranges, use.names=FALSE)
   names(genes) <- names(gene_ranges)

This is one of the reasons why it's important to make sure that
'gene_ranges' contains exactly one range per gene.

Cheers,
H.

>
> I'd then loop over the `names(cvr)` and do your counting as you like, maybe:
>
> R>  info<- lapply(names(cvr), function(chr) {
>    g<- genes[seqnames(genes) == chr]
>    v<- Views(cvr[[chr]], ranges(g))
>    viewSums(v)
> })
>
> I'd imagine you will want to do more book-keeping than what I'm
> showing, but here's a start.
>
> -steve
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list