[Bioc-sig-seq] Standard 454 quality checks?

Dan Bolser dan.bolser at gmail.com
Mon May 4 11:19:03 CEST 2009


2009/4/23 Martin Morgan <mtmorgan at fhcrc.org>:
> Dan Bolser <dan.bolser at gmail.com> writes:
>
>> Sorry for the noob question, but is there a set of standard quality
>> checks in R that I can run over some 454 data? I have the fasta and
>> the fasta format quality files as well as an sff. I scanned the manual
>> for the ShortReads package, but it seems focused on Illumina, I
>> couldn't pick out the general bits from the specifics.
>
> Hi Dan --
>
> You might be on somewhat uncharted territory here; most of our
> experience (though we have some 454 data now) is with Solexa.
>
> I don't think the standard QA pipeline, along the lines of
> report(qa(<...>)), will work at the moment, but I'll try to add that
> today.
>
> You should be able to read the fasta and quality scores with
> read454(). This returns a 'ShortReadQ' object, srq, that bundles the
> reads, their quality scores, and their ids.

Hi Martin, thanks for the info on this. I'm have only just got round
to looking at this but I'm a little bit confused about how to read in
the reads / qualities.

I had expected to say something like:

x <- read454(fasta = "my.fas", qual.fasta = "my.qual")


or

x <- read454(srf = "my.srf")


or

x <- read454(fastq = "my.fastq")



However, this is clearly not correct. "?read454" brings up the
"RochePath-class" page that suggests I try something like:

x <- RochePath(readPath="./")
y<- read454(x)


But that fails too... "no input files found / pattern: \.fna$". I
tried to set pattern somewhere (to match the ".fas" / and ".qual"
patterns of my files), but I couldn't seem to set them anywhere.

Can you help with an example of what I am doing wrong?


How come I need to specify a directory and a pattern when what I
really want to do is just to specify a file (because that is what I
have)?


Thanks for any help.

All the best,
Dan.



> The basic touch points of the qa report for read (i.e., not aligned)
> data are numbers of reads, nucleotide frequencies
>
>  alphabetFrequency(sread(srq), baseOnly=TRUE, collapse=TRUE)
>
> and cycle-specific alphabet frequencies and average quality scores
> (use alphabetByCycle on sread(srq) and quality(srq)). For 454 it seems
> like a simple plot of average quality score, along the lines of
> alphabetScore(quality(srq)) / width(quality(srq)) against
> width(quality(srq)) can also be quite insightful. There might be
> issues where the functions expect / it makes sense to do analysis on
> uniform-width reads, or on groups of uniformly-widthed reads.
>
> Sorry for the only limited help.
>
> Martin
>
>
>> Thanks for any help,
>> Dan.
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>



More information about the Bioc-sig-sequencing mailing list