[Bioc-sig-seq] Standard 454 quality checks?

Patrick Aboyoun paboyoun at fhcrc.org
Mon May 4 18:11:14 CEST 2009


Michael,
I think we need to reconcile interfaces because ShortRead's 
readBaseQuality function is intended to perform the same type of 
activity as readFastaQual. The only difference was that it was looking 
at Solexa, rather than 454, data.


Patrick


Michael Lawrence wrote:
> On Mon, May 4, 2009 at 2:19 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
>
>   
>> 2009/4/23 Martin Morgan <mtmorgan at fhcrc.org>:
>>     
>>> Dan Bolser <dan.bolser at gmail.com> writes:
>>>
>>>       
>>>> Sorry for the noob question, but is there a set of standard quality
>>>> checks in R that I can run over some 454 data? I have the fasta and
>>>> the fasta format quality files as well as an sff. I scanned the manual
>>>> for the ShortReads package, but it seems focused on Illumina, I
>>>> couldn't pick out the general bits from the specifics.
>>>>         
>>> Hi Dan --
>>>
>>> You might be on somewhat uncharted territory here; most of our
>>> experience (though we have some 454 data now) is with Solexa.
>>>
>>> I don't think the standard QA pipeline, along the lines of
>>> report(qa(<...>)), will work at the moment, but I'll try to add that
>>> today.
>>>
>>> You should be able to read the fasta and quality scores with
>>> read454(). This returns a 'ShortReadQ' object, srq, that bundles the
>>> reads, their quality scores, and their ids.
>>>       
>> Hi Martin, thanks for the info on this. I'm have only just got round
>> to looking at this but I'm a little bit confused about how to read in
>> the reads / qualities.
>>
>> I had expected to say something like:
>>
>> x <- read454(fasta = "my.fas", qual.fasta = "my.qual")
>>
>>
>> or
>>
>> x <- read454(srf = "my.srf")
>>
>>
>> or
>>
>> x <- read454(fastq = "my.fastq")
>>
>>
>>
>> However, this is clearly not correct. "?read454" brings up the
>> "RochePath-class" page that suggests I try something like:
>>
>> x <- RochePath(readPath="./")
>> y<- read454(x)
>>
>>
>> But that fails too... "no input files found / pattern: \.fna$". I
>> tried to set pattern somewhere (to match the ".fas" / and ".qual"
>> patterns of my files), but I couldn't seem to set them anywhere.
>>
>> Can you help with an example of what I am doing wrong?
>>
>>     
>
>
> Thanks for this great feedback. One way to achieve what you want would be to
> call readFasta and readQual separately for each file, and then combine the
> results into a ShortReadQ. In the devel version of ShortRead (it might take
> a day to propagate), I just renamed read454 to readFastaQual and added
> separate pattern arguments for the fasta and qual file, so you could do:
> srq <- readFastaQual(".", "my.fas", "my.qual")
>
> How does that sound? I think that this interface could be made more
> convenient (such as having separate pattern arguments for the extensions and
> then a "base" pattern), but I'm just trying to stay consistent with what is
> ShortRead already.
>
> Michael
>
>
>   
>> How come I need to specify a directory and a pattern when what I
>> really want to do is just to specify a file (because that is what I
>> have)?
>>
>>
>> Thanks for any help.
>>
>> All the best,
>> Dan.
>>
>>
>>
>>     
>>> The basic touch points of the qa report for read (i.e., not aligned)
>>> data are numbers of reads, nucleotide frequencies
>>>
>>>  alphabetFrequency(sread(srq), baseOnly=TRUE, collapse=TRUE)
>>>
>>> and cycle-specific alphabet frequencies and average quality scores
>>> (use alphabetByCycle on sread(srq) and quality(srq)). For 454 it seems
>>> like a simple plot of average quality score, along the lines of
>>> alphabetScore(quality(srq)) / width(quality(srq)) against
>>> width(quality(srq)) can also be quite insightful. There might be
>>> issues where the functions expect / it makes sense to do analysis on
>>> uniform-width reads, or on groups of uniformly-widthed reads.
>>>
>>> Sorry for the only limited help.
>>>
>>> Martin
>>>
>>>
>>>       
>>>> Thanks for any help,
>>>> Dan.
>>>>
>>>> _______________________________________________
>>>> Bioc-sig-sequencing mailing list
>>>> Bioc-sig-sequencing at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>>>         
>>> --
>>> Martin Morgan
>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N.
>>> PO Box 19024 Seattle, WA 98109
>>>
>>> Location: Arnold Building M1 B861
>>> Phone: (206) 667-2793
>>>
>>>       
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>     
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>



More information about the Bioc-sig-sequencing mailing list