[BioC] outlier probes detection

Hooiveld, Guido Guido.Hooiveld at wur.nl
Wed May 9 11:51:52 CEST 2012

Hi Andrea,
I have to admit that it has been a while since I actively used Harshlight. However, AFAIK Harshligt detect both outlier probes. You can have Harslight automatically correct these outlier probes by having their (outlier) value replaced by either 'NA' or by the median value for all arrays. 
See '?Harshlight' for more details: "na.sub: If TRUE, the intensity values of the input affyBatch that are affected by defects will be changed in NA. If FALSE, the values will be substituted with the median of the intensity values of the other chips."

If you have a brief look at Figure 2 of this Harshlight paper http://www.biomedcentral.com/1471-2105/6/294 you will see that a representative image of all arrays is obtained by creating a 'median image'. Each individual array is then compared to this median image, and defects are identified by deviations to the median image using a set of criteria (again see ?Harshlight for more details).
Thus, the number of arrays that is analysed together will (slightly) affect the outcomes of Harshlight. If I were you I would analyse all arrays from an experiment together (in your case all 40), have Harshlight replace all outlier values by the median, and then continue with normalizing using e.g. (GC)RMA.


-----Original Message-----
From: andrea.grilli at ior.it [mailto:andrea.grilli at ior.it] 
Sent: Wednesday, May 09, 2012 10:20
To: Hooiveld, Guido
Cc: bioconductor at r-project.org
Subject: Re: [BioC] outlier probes detection

Hi Guido,
thank you for your reply.
I checked the package Harshlight you suggested. Although it detects outlier arrays (and not outlier probes) it works well for the case, because it gives a percentage of the defects and that's better than a simple visual evaluation.
I have one question about the package evaluation of these defects.  
Because of the intense calculation, I tried either splitting the case study in two groups (20 and 20 arrays) and later on with the 40 chips all together: according to the package output, one array should be excluded only in the second case. Is there some sort of evaluation of the defects depending also on the set of arrays a chip is analyzed with? I flipped through the concerning paper but I didn't find any information about that..

I also checked the solution proposed by Okko (thank you for your suggestion), but because it's a stronger approach I'll need more time to evaluate it.


"Hooiveld, Guido" <Guido.Hooiveld at wur.nl> ha scritto:

> Hi Andrea,
> If the affected area is relatively small (less than 5-10% of total
> area) we usually ignore these scratches/bubbles (because each probeset 
> is comprised of multiple probes, and the robust summarization methods 
> usually used within RMA (median polish or
> M-estimator) are able to handle these outliers pretty well).
> Alternatively, the package 'Harshlight' offers options to correct for 
> various types of artefacts.
> http://www.bioconductor.org/packages/2.10/bioc/html/Harshlight.html
> Regards,
> Guido
> ---------------------------------------------------------
> Guido Hooiveld, PhD
> Nutrition, Metabolism & Genomics Group Division of Human Nutrition 
> Wageningen University Biotechnion, Bomenweg 2
> NL-6703 HD Wageningen
> the Netherlands
> tel: (+)31 317 485788
> fax: (+)31 317 483342
> email:      guido.hooiveld at wur.nl
> internet:   http://nutrigene.4t.com
> http://scholar.google.com/citations?user=qFHaMnoAAAAJ
> http://www.researcherid.com/rid/F-4912-2010
> -----Original Message-----
> From: bioconductor-bounces at r-project.org
> [mailto:bioconductor-bounces at r-project.org] On Behalf Of 
> andrea.grilli at ior.it
> Sent: Tuesday, May 08, 2012 12:15
> To: bioconductor at r-project.org
> Subject: [BioC] outlier probes detection
> Dear all,
> I'm performing an analysis on HGU133plus2 arrays with 40 samples; 
> looking at their surface with "affyPLM" package, I've seen a couple of 
> arrays with small scratches and one more with a small bubble.
> Because I don't want to exclude these arrays (according to Murphys'  
> law 2 on 3 belong to the class with less samples), I want to detect 
> those probes and to exclude them.
> I was thinking in some outlier detection method, but because I'm new 
> to this problem I don't know if this is the right method and which 
> packages can be appropriate (did some research but I've no clear 
> idea).
> Any help is really appreciated,
> andrea
> Dr. Andrea Grilli
> andrea.grilli at ior.it
> phone 051/63.66.756
> Laboratory of Experimental Oncology,
> Development of  Biomolecular Therapies unit, Rizzoli Orthopaedic 
> Institute Codivilla Putti Research Center via di Barbiano 1/10
> 40136 - Bologna - Italy
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:  
> http://news.gmane.org/gmane.science.biology.informatics.conductor

More information about the Bioconductor mailing list