[Bioc-sig-seq] Assessing Transcriptome Coverage

Michael Dondrup Michael.Dondrup at bccs.uib.no
Tue Aug 18 16:44:09 CEST 2009


Hi

it looks as if the largest proportion of the genome have a coverage of  
0. This is of course to expect, but it means that you will have to
play with the ylim parameter too, because otherwise the frequency for  
the first bins will dominate the plot, that's why you just see one  
bar. see ?hist

In such a setting I just try something like:
hist(lane1, ylim=c(0, 2000), breaks=seq(1, max(c1)+100, 100))
for bins of width 100 starting from 1

In addition the package GenomeGraphs provides additional methods to  
plot the coverage over the chromosome
which is maybe of interest, too. see the examples in the vignette.

Michael

Am 17.08.2009 um 21:07 schrieb Abhishek Pratap:

> Hi
>
> I dont have a lot of experience with plotting large amount of data
> points and clearly my question reflects that. :)
>
> summary(lane1)
>     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's
>    0.000     0.040     0.180     5.186     0.620 39730.000  2264.000
>
> Thanks for your help.
>
> -Abhi
>
>
> On Mon, Aug 17, 2009 at 3:05 PM, Sean Davis<seandavi at gmail.com> wrote:
>>
>>
>> On Mon, Aug 17, 2009 at 3:01 PM, Abhishek Pratap <abhishek.vit at gmail.com 
>> >
>> wrote:
>>>
>>> Hi Sean
>>>
>>> Thanks for your suggestion on both the mailing lists. I am now  
>>> reading
>>> the coverage values from a file and storing them as a data.frame and
>>> then creating a new numeric vector for each lane. Each vector may  
>>> have
>>> 15000-45000 entries.  The values are integers with a significant
>>> difference in values, some could be between 0-1 eg (0.45,0.89) and
>>> then I also have values in range like  (4000, 44000). I am just  
>>> taking
>>> random examples to explain the bias in the data.
>>>
>>> When I plot a histogram I just see one big bar. I feel the bins are
>>> not created effectively. I also tried couple of different options in
>>> the R hist function but with same result.
>>>
>>> hist(lane2, freq=TRUE, breaks=10);
>>>  hist(lane2, freq=TRUE, include.lowest=TRUE);
>>
>> What does summary(lane2) show?  You may need to transform the data  
>> to make
>> it more presentable (log?).
>>
>> Sean
>>
>>
>>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Michael Dondrup, Ph.D.
Bergen Center for Computational Science
Computational Biology Unit
Unifob AS - Thormøhlensgate 55, N-5008 Bergen, Norway
Phone: +47 55584029 Fax: +47 55584295



More information about the Bioc-sig-sequencing mailing list