[Bioc-sig-seq] Standard 454 quality checks?

Martin Morgan mtmorgan at fhcrc.org
Thu Apr 23 18:27:40 CEST 2009


Dan Bolser <dan.bolser at gmail.com> writes:

> Sorry for the noob question, but is there a set of standard quality
> checks in R that I can run over some 454 data? I have the fasta and
> the fasta format quality files as well as an sff. I scanned the manual
> for the ShortReads package, but it seems focused on Illumina, I
> couldn't pick out the general bits from the specifics.

Hi Dan --

You might be on somewhat uncharted territory here; most of our
experience (though we have some 454 data now) is with Solexa.

I don't think the standard QA pipeline, along the lines of
report(qa(<...>)), will work at the moment, but I'll try to add that
today.

You should be able to read the fasta and quality scores with
read454(). This returns a 'ShortReadQ' object, srq, that bundles the
reads, their quality scores, and their ids.

The basic touch points of the qa report for read (i.e., not aligned)
data are numbers of reads, nucleotide frequencies

  alphabetFrequency(sread(srq), baseOnly=TRUE, collapse=TRUE)

and cycle-specific alphabet frequencies and average quality scores
(use alphabetByCycle on sread(srq) and quality(srq)). For 454 it seems
like a simple plot of average quality score, along the lines of
alphabetScore(quality(srq)) / width(quality(srq)) against
width(quality(srq)) can also be quite insightful. There might be
issues where the functions expect / it makes sense to do analysis on
uniform-width reads, or on groups of uniformly-widthed reads.

Sorry for the only limited help.

Martin


> Thanks for any help,
> Dan.
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list