[BioC] microarray outlier detection

Gordon K Smyth smyth at wehi.EDU.AU
Sun Sep 1 03:21:32 CEST 2013


Dear jial2,

As others have pointed out, a sample should only be removed as an outlier 
if it is quite clear cut, preferably with some identifiable cause.

It is however possible to allow for some samples being less reliable than 
others, and it is usually preferable to do this in a graduated way rather 
than complete removal.  In my lab, we regularly use the arrayWeights() 
function in the limma package to identify and downweight lower quality 
microarray samples.  The weights just feed into the usual differential 
expression analysis.  See

   http://www.biomedcentral.com/1471-2105/7/261/

and Chapter 14 of the limma User's Guide.

Of course, if your samples don't cluster at all, it may simply be that 
your groups are just not systematically different, and no refinement to 
the analysis will make them so.

Best wishes
Gordon

> Date: Fri, 30 Aug 2013 13:32:33 -0700 (PDT)
> From: "guest [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, jial2 at mail.nih.gov
> Subject: [BioC] microarray outlier detection
>
>
> Dear users,
>
> I have human gene 2.0 st array, total 12 samples including 4 groups, 
> each group has 3 replicates. The lab person would like to remove one 
> from each of the group due to the outliers, but from PCA plot, the 
> samples are not clustered, it is hard to remove any sample as an 
> outlier. I wonder if we have the package or function to solve the 
> outlier detection issue on microarray.
>
> Thanks,
>
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] pd.hugene.2.0.st_3.8.0                oligo_1.24.1                          oligoClasses_1.22.0                   hugene20sttranscriptcluster.db_2.12.1
> [5] org.Hs.eg.db_2.9.0                    RSQLite_0.11.4                        DBI_0.2-7                             AnnotationDbi_1.22.6
> [9] Biobase_2.20.1                        BiocGenerics_0.6.0                    limma_3.16.6
>
> loaded via a namespace (and not attached):
> [1] affxparser_1.32.3     affyio_1.28.0         BiocInstaller_1.10.3  Biostrings_2.28.0     bit_1.1-10            codetools_0.2-8
> [7] ff_2.2-11             foreach_1.4.1         GenomicRanges_1.12.4  IRanges_1.18.2        iterators_1.0.6       preprocessCore_1.22.0
> [13] splines_3.0.1         stats4_3.0.1          zlibbioc_1.6.0

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list