[BioC] outlier detection of fit.li.wong

James W. MacDonald jmacdon at med.umich.edu
Fri Aug 10 17:36:35 CEST 2007


Hi Yi,

Yi Xing wrote:
> Hi,
> 
> I am a little puzzled by the behavior of fit.li.wong function  (affy 
> package) in conducting outlier detection. I created a matrix
> x <- sweep(matrix(2^rnorm(600),30,20),1,seq(1,2,len=30),FUN="+")
> 
> then set x[30,20] as the outlier:
> x[30,20]=9999
> 
> When I ran  fit.li.wong(x,outlier.detection=TRUE), x[30,20] was 
> recognized as an outlier, but apparently it was NOT removed from the 
> computation of theta. theta[30] is obviously affected by the single 
> outlier.

I don't see an argument for outlier.detection in fit.li.wong. If you 
mean remove.outliers, then I believe it is working as advertised.

Using your example:

 > all.equal(fit.li.wong(x)$theta, fit.li.wong(x, remove.outliers=F)$theta)
[1] "Mean relative  difference: 2.632672"
 > cbind(fit.li.wong(x)$theta, fit.li.wong(x, remove.outliers=F)$theta)
             [,1]         [,2]
  [1,]   2.477412    0.7790772
  [2,]   2.440048    0.9618730
  [3,]   2.283680    0.3710469
  [4,]   2.004736    0.3537434
  [5,]   2.302030    0.2815720
  [6,]   2.368680    0.3209734
  [7,]   2.508436    0.6738310
  [8,]   2.426458    0.7175141
  [9,]   2.397586    0.6339105
[10,]   2.662556    0.6344126
[11,]   2.476010    0.5114757
[12,]   2.495807    0.4771915
[13,]   2.801699    0.5022871
[14,]   2.641723    0.5031644
[15,]   3.178295    0.5674871
[16,]   3.065739    0.3789646
[17,]   2.741703    0.6520351
[18,]   2.799087    0.4864749
[19,]   2.889033    0.6559938
[20,]   2.841164    0.6615991
[21,]   2.825730    0.5604023
[22,]   3.030698    0.7263582
[23,]   2.839171    0.5869337
[24,]   2.751788    1.2154618
[25,]   3.026560    0.8351068
[26,]   3.215382    0.5823551
[27,]   3.051072    0.6278876
[28,]   3.350610    0.5773556
[29,]   3.841350    2.9516553
[30,] 574.199766 2235.8471717


I think there is a difference between what you expect remove.outliers to 
do and what it actually does (e.g., remove an outlier from the 
computation of theta vs remove an outlier from a dataset and pretend it 
never existed).

Best,

Jim



> 
> I would like to know how to fix this. Any suggestion is welcome.
> 
> Yi Xing
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623



More information about the Bioconductor mailing list