[Bioc-sig-seq] filtering by adapters in QA report

Sat Mar 26 22:01:48 CET 2011

Hi Martin and Robert

if you look at arrayQualityMetrics, then consider the function 
arrayQualityMetrics() as a template that a user should take and modify, 
to add or remove parts of the report, or insert their additional data 
preprocessing steps.

	*	*	*	

Btw, right now the reports will only look good on Firefox 4, not perfect 
on Chrome 10 and not at all on Safari (and I don't know about IE yet).
If anyone has an insight on how to do the moral equivalent of

  ssrules = document.styleSheets[0].cssRules;
  ssrules[i].style.cssText = "stroke-width:3";

in Chrome, I'd be delighted and grateful.

	Best wishes
	Wolfgang

Il Mar/25/11 5:41 PM, Robert Gentleman ha scritto:
> On Fri, Mar 25, 2011 at 8:59 AM, Martin Morgan<mtmorgan at fhcrc.org>  wrote:
>> On 03/24/2011 10:56 AM, Michael Lawrence wrote:
>>>
>>> Hi Martin,
>>>
>>> It would be nice if the ShortRead QA report could somehow filter out the
>>> adapter contamination before generating the rest of its plots, since those
>>> plots are pretty meaningless if there are adapters present.
>>>
>>> Not sure how to handle this filtering in general. That is, what if someone
>>> then wants to see plots with only the "high quality" reads after the
>>> quality
>>> plots. It gets complicated. ShortRead has a nice filtering mechanism, but
>>> this is more complicated, since some QA plots come from one filter, while
>>> others come from a different stage.
>>>
>>> However, under the assumption that no one would ever want to align an
>>> adapter, i.e., those reads will not be carried forward, the adapter
>>> removal
>>> could just be treated specially hard-coded. And then just expect more
>>> customized solutions to leverage the internal ShortRead functions for
>>> generating each slot in the QA object, building it up incrementally, on
>>> different subsets. Of course, to make sense, that would require a
>>> different
>>> report template, too.
>>
>> Hi Michael -- Yes it would be nice to be able to more flexibly control how
>> different components of the report are generated, or at least to make some
>> smarter choices along the lines you suggest for adapter contaminants. It's
>> hard to know how to make this really general, but I have come across other
>> situations where I'd like to cherry-pick which parts of the QA process I
>> want to perform. I think I need some standardization on function signatures
>> for generating each report section, tighter description of results from each
>> section (i.e., a formal class  hierarchy), and then a flexible report
>> composition. It seems like quite a big task; I wonder if there are good
>> models out there to follow? arrayQualityMetrics?
>
>    I think arrayQualityMetrics is a good starting place.  Audrey and
> Wolfgang have
> done a good job of modularizing the components.  But there are still
> hiccups - which
> suggests just how hard that is.  And as you suggested, it was a big job.
>
>    I think the case Michael is bringing up might be useful to deal with, without
> a major rewrite.  There should be some sort of file that ShortRead has access to
> (or an input parameter) that gives some more details on the samples and on the
> processing (eg what the sample labels should be, and what the adapters etc are).
> Then this information could be used in the current paradigm.
>
> Mostly the issue is that if you have adapter contamination then the
> subsequent plots
> (eg nucleotide by cycle) are not useful.  You cannot see anything in
> them and then
> you have to go back and strip adapters by hand, then rerun ShortRead.
> I agree that
> you may want more general filtering, as an abundance of any read will
> affect the plots,
> but I think there is agreement that one would never want to include
> the adapters (you do want
> counts as are produced now, but given their affect on the graphics
> filtering would be
> beneficial).
>
>    best wishes
>      Robert
>>
>> Martin
>>
>>>
>>> Michael
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>
>> --
>> Computational Biology
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>
>> Location: M1-B861
>> Telephone: 206 667-2793
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>
>
>

-- 

Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber