[BioC] read coverage per transcript for RNASeq
Martin Morgan
mtmorgan at fhcrc.org
Wed Sep 2 01:35:38 CEST 2009
Hi Jessica -- mixing up the order a bit...
> Also, while I am writing, I was interested in only looking at reads
> above a certain quality score. When I input the data with
> readAligned, I couldn’t figure out how to filter on this variable,
> so I ended up reading the data just into R, filtering that way, and
> then reading the data into ShortRead. Is there a better way to do
> this? All the transferring of the data back and forth took a lot of
> time to process.
readAligned takes a filter argument, see ?srFilter (maybe
alignQualityFilter ? 'quality' is a bit ambiguous) and the final
example on ?readAligned. Another route is along the lines of
aln[ quality(alignQuality(aln)) > certainQualityScore ]
> I have short read sequences for RNASeq that I have then aligned to
> the reference transcript sequences. The reads are from a Solexa
> platform and they were aligned using MAQ. I have imported the data
> using ShortReads (readAligned) and all looks good. However, I am
> interested in limiting the data to those transcripts with at least
> 85% coverage. I can’t find any documentation to do this that
> doesn’t involved looking at each transcript individually. Any way
> to do this for all transcripts (30,000+)?
Hmm. You say they're aligned to the reference transcript sequences, so
it sounds like chromosome(aln) gives you the transcript each is
aligned to and
cvg <- coverage(aln)
gives you a list of 'run-length-encoded' coverage vectors, one for
each transcript. I'm not sure what '85% coverage' means, but maybe
you're after something like
sum(cvg) > .85 * sapply(cvg, length)
A little more detail might lead to a clearer solution. The
bioc-sig-sequencing news groups
https://stat.ethz.ch/pipermail/bioc-sig-sequencing/
has several posts recently that might be helpful.
Hope that helps,
Martin
"Jessica Hunter" <hunter at ohsu.edu> writes:
> Hello all.
>
>
>
> I have short read sequences for RNASeq that I have then aligned to
> the reference transcript sequences. The reads are from a Solexa
> platform and they were aligned using MAQ. I have imported the data
> using ShortReads (readAligned) and all looks good. However, I am
> interested in limiting the data to those transcripts with at least
> 85% coverage. I can’t find any documentation to do this that
> doesn’t involved looking at each transcript individually. Any way
> to do this for all transcripts (30,000+)?
>
>
>
>
>
>
> Lastly, any good online resources for RNASeq applications with
> Bioconductor that people can recommend?
>
>
>
> Thank you,
>
>
>
> Jessica
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list