[BioC] objective criterion for identification of outlying arrays by pca

Kevin R. Coombes kevin.r.coombes at gmail.com
Wed Nov 2 16:12:45 CET 2011


The Mahalanobis distance (also known as Hotelling's T^2 statistic) from 
the center of a D-dimensional principal component space (under some 
sensible null hypothesis) should follow a chi-squared distribution with 
D degrees of freedom.  You can thus conduct a test for outliers based on 
the p-value associated with the chi-squared statistic.  (We used this 
idea for QC in a serum proteomics study a long time ago: Coombes et al, 
Clin Chem 2003; 49:1615-23.)

     Kevin

On 11/2/2011 9:11 AM, James W. MacDonald wrote:
> Hi Rich,
>
> On 11/2/2011 10:04 AM, Richard Friedman wrote:
>> Dear Bioconductor List,
>>
>>     Does anyone know of an objective criterion for the identification 
>> of outlying arrays
>> by pca?
>
> I don't know an objective criterion for this. However, unless the 
> 'outlier' is ridiculously bad, you might be better off using array 
> weights to down-weight the offending array(s). In limma, the 
> arrayWeights() and arrayWeightsSimple() functions allow you to 
> generate weights that you can then feed into lmFit().
>
> Best,
>
> Jim
>
>
>>
>>     I usually do this subjectively. However the experimental 
>> investigator whom I am helping
>> has a different subjective sense than I do, so that I wonder if there 
>> is a hard-and-fast criterion.
>>
>> Thanks and best wishes,
>> Rich
>> ------------------------------------------------------------
>> Richard A. Friedman, PhD
>> Associate Research Scientist,
>> Biomedical Informatics Shared Resource
>> Herbert Irving Comprehensive Cancer Center (HICCC)
>> Lecturer,
>> Department of Biomedical Informatics (DBMI)
>> Educational Coordinator,
>> Center for Computational Biology and Bioinformatics (C2B2)/
>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)
>> Room 824
>> Irving Cancer Research Center
>> Columbia University
>> 1130 St. Nicholas Ave
>> New York, NY 10032
>> (212)851-4765 (voice)
>> friedman at cancercenter.columbia.edu
>> http://cancercenter.columbia.edu/~friedman/
>>
>> I am a Bayesian. When I see a multiple-choice question on a test and 
>> I don't
>> know the answer I say "eeney-meaney-miney-moe".
>>
>> Rose Friedman, Age 14
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list