[Bioc-sig-seq] filtering by adapters in QA report

Fri Mar 25 17:41:24 CET 2011

On Fri, Mar 25, 2011 at 8:59 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 03/24/2011 10:56 AM, Michael Lawrence wrote:
>>
>> Hi Martin,
>>
>> It would be nice if the ShortRead QA report could somehow filter out the
>> adapter contamination before generating the rest of its plots, since those
>> plots are pretty meaningless if there are adapters present.
>>
>> Not sure how to handle this filtering in general. That is, what if someone
>> then wants to see plots with only the "high quality" reads after the
>> quality
>> plots. It gets complicated. ShortRead has a nice filtering mechanism, but
>> this is more complicated, since some QA plots come from one filter, while
>> others come from a different stage.
>>
>> However, under the assumption that no one would ever want to align an
>> adapter, i.e., those reads will not be carried forward, the adapter
>> removal
>> could just be treated specially hard-coded. And then just expect more
>> customized solutions to leverage the internal ShortRead functions for
>> generating each slot in the QA object, building it up incrementally, on
>> different subsets. Of course, to make sense, that would require a
>> different
>> report template, too.
>
> Hi Michael -- Yes it would be nice to be able to more flexibly control how
> different components of the report are generated, or at least to make some
> smarter choices along the lines you suggest for adapter contaminants. It's
> hard to know how to make this really general, but I have come across other
> situations where I'd like to cherry-pick which parts of the QA process I
> want to perform. I think I need some standardization on function signatures
> for generating each report section, tighter description of results from each
> section (i.e., a formal class  hierarchy), and then a flexible report
> composition. It seems like quite a big task; I wonder if there are good
> models out there to follow? arrayQualityMetrics?

  I think arrayQualityMetrics is a good starting place.  Audrey and
Wolfgang have
done a good job of modularizing the components.  But there are still
hiccups - which
suggests just how hard that is.  And as you suggested, it was a big job.

  I think the case Michael is bringing up might be useful to deal with, without
a major rewrite.  There should be some sort of file that ShortRead has access to
(or an input parameter) that gives some more details on the samples and on the
processing (eg what the sample labels should be, and what the adapters etc are).
Then this information could be used in the current paradigm.

Mostly the issue is that if you have adapter contamination then the
subsequent plots
(eg nucleotide by cycle) are not useful.  You cannot see anything in
them and then
you have to go back and strip adapters by hand, then rerun ShortRead.
I agree that
you may want more general filtering, as an abundance of any read will
affect the plots,
but I think there is agreement that one would never want to include
the adapters (you do want
counts as are produced now, but given their affect on the graphics
filtering would be
beneficial).

  best wishes
    Robert
>
> Martin
>
>>
>> Michael
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>

-- 
Robert Gentleman
rgentlem at gmail.com