[Bioc-sig-seq] Assessing Transcriptome Coverage
Michael Dondrup
Michael.Dondrup at bccs.uib.no
Tue Aug 18 16:44:09 CEST 2009
Hi
it looks as if the largest proportion of the genome have a coverage of
0. This is of course to expect, but it means that you will have to
play with the ylim parameter too, because otherwise the frequency for
the first bins will dominate the plot, that's why you just see one
bar. see ?hist
In such a setting I just try something like:
hist(lane1, ylim=c(0, 2000), breaks=seq(1, max(c1)+100, 100))
for bins of width 100 starting from 1
In addition the package GenomeGraphs provides additional methods to
plot the coverage over the chromosome
which is maybe of interest, too. see the examples in the vignette.
Michael
Am 17.08.2009 um 21:07 schrieb Abhishek Pratap:
> Hi
>
> I dont have a lot of experience with plotting large amount of data
> points and clearly my question reflects that. :)
>
> summary(lane1)
> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
> 0.000 0.040 0.180 5.186 0.620 39730.000 2264.000
>
> Thanks for your help.
>
> -Abhi
>
>
> On Mon, Aug 17, 2009 at 3:05 PM, Sean Davis<seandavi at gmail.com> wrote:
>>
>>
>> On Mon, Aug 17, 2009 at 3:01 PM, Abhishek Pratap <abhishek.vit at gmail.com
>> >
>> wrote:
>>>
>>> Hi Sean
>>>
>>> Thanks for your suggestion on both the mailing lists. I am now
>>> reading
>>> the coverage values from a file and storing them as a data.frame and
>>> then creating a new numeric vector for each lane. Each vector may
>>> have
>>> 15000-45000 entries. The values are integers with a significant
>>> difference in values, some could be between 0-1 eg (0.45,0.89) and
>>> then I also have values in range like (4000, 44000). I am just
>>> taking
>>> random examples to explain the bias in the data.
>>>
>>> When I plot a histogram I just see one big bar. I feel the bins are
>>> not created effectively. I also tried couple of different options in
>>> the R hist function but with same result.
>>>
>>> hist(lane2, freq=TRUE, breaks=10);
>>> hist(lane2, freq=TRUE, include.lowest=TRUE);
>>
>> What does summary(lane2) show? You may need to transform the data
>> to make
>> it more presentable (log?).
>>
>> Sean
>>
>>
>>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
Michael Dondrup, Ph.D.
Bergen Center for Computational Science
Computational Biology Unit
Unifob AS - Thormøhlensgate 55, N-5008 Bergen, Norway
Phone: +47 55584029 Fax: +47 55584295
More information about the Bioc-sig-sequencing
mailing list