[BioC] Coverage by base
Hervé Pagès
hpages at fhcrc.org
Wed Oct 12 19:53:58 CEST 2011
Hi Steve,
On 11-10-12 09:35 AM, Steve Lianoglou wrote:
> Hi,
>
> 2011/10/12 Hervé Pagès<hpages at fhcrc.org>:
>> Hi Rohan,
>>
>> On 11-10-11 11:49 AM, rohan bareja wrote:
>>>
>>> Hi,
>>> Ya thats correct..I have summed up the
>>> reads using viewSums but I want to get the gene information .Hi,
>>> I found this archive from bioconductor-sig-sequencing mailing list where
>>> they have discussed about cov_by_gene<- Views(cov, genes)
>>>
>>> viewSums(cov_by_gene)https://stat.ethz.ch/pipermail/bioc-sig-sequencing/attachments/20110723/b9f2c69c/attachment.pl
>>> But I am not sure what is the object "genes" that has been used along with
>>> the coverage.Thats exactly what I would like to do.
>>
>> I think 'genes' is a GRanges object object that was obtained with
>> something like (pseudo-code):
>>
>> library(GenomicFeatures)
>> txdb<- makeTranscriptDbFromUCSC(...)
>> gene_ranges<- range(transcriptsBy(txdb, by="gene"))
>> genes<- unlist(gene_ranges)
>
> I'm not so sure?
>
> I can't find any Views(*) methods defined for GRanges objects, but
> maybe I'm missing something.
Right. It needs to be coerced to RangesList first:
> genes
GRanges with 3 ranges and 0 elementMetadata values:
seqnames ranges strand
<Rle> <IRanges> <Rle>
geneA chr1 [3, 6] *
geneB chr2 [4, 7] *
geneC chr1 [5, 8] *
---
seqlengths:
chr1 chr2
NA NA
> genes_rgl <- as(genes, "RangesList")
> genes_rgl
CompressedIRangesList of length 2
$chr1
IRanges of length 2
start end width names
[1] 3 6 4 geneA
[2] 5 8 4 geneC
$chr2
IRanges of length 1
start end width names
[1] 4 7 4 geneB
Then you can use Views() on your 'cvr' object below:
> vl <- Views(cvr, genes_rgl)
> vl
SimpleRleViewsList of length 2
names(2): chr1 chr2
viewSums() and the other summarization functions (see ?viewSums) work
on this SimpleRleViewsList object:
> viewSums(vl)
SimpleIntegerList of length 2
[["chr1"]] 42 15
[["chr2"]] 18
and the names of the genes are still here:
> viewSums(vl)[["chr1"]]
geneA geneC
42 15
>
> If that's the case, I'd just go the manual route.
>
> Rohan: let's assume you have a SimpleRleList named `cvr` which
> represents the coverage vectors for each chromosome in your bam file,
> something like:
>
> R> cvr
> SimpleRleList of length 25
> $chr1
> 'integer' Rle of length 249250621 with 249520 runs
> Lengths: 14362 4 23 1 5 ... 25 974 26 12758
> Values : 0 4 5 6 5 ... 1 0 1 0
>
> $chr2
> 'integer' Rle of length 243199373 with 179635 runs
> Lengths: 10103 31 3803 43 21102 ... 27 4543 23 19174
> Values : 0 1 0 1 0 ... 1 0 1 0
>
> And say that you've done what Herve has suggested where you have a
> GRanges object names `genes` that has all of your gene info:
>
> R> genes
> seqnames ranges strand |
> <Rle> <IRanges> <Rle> |
> 1.1 chr19 [ 58858172, 58864865] - |
> 10.2 chr8 [ 18248755, 18258723] + |
> 100.3 chr20 [ 43248163, 43280376] - |
> 1000.4 chr18 [ 25530930, 25757445] - |
> 10000.5 chr1 [243651535, 244006886] - |
> 100008586.6 chrX [ 49217771, 49342266] + |
> ...
Forgot to say: get rid of those ugly/silly names with:
genes <- unlist(gene_ranges, use.names=FALSE)
names(genes) <- names(gene_ranges)
This is one of the reasons why it's important to make sure that
'gene_ranges' contains exactly one range per gene.
Cheers,
H.
>
> I'd then loop over the `names(cvr)` and do your counting as you like, maybe:
>
> R> info<- lapply(names(cvr), function(chr) {
> g<- genes[seqnames(genes) == chr]
> v<- Views(cvr[[chr]], ranges(g))
> viewSums(v)
> })
>
> I'd imagine you will want to do more book-keeping than what I'm
> showing, but here's a start.
>
> -steve
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list