[BioC] Outlier removal in Beadarray bead level data analysis

Matt Ritchie Matt.Ritchie at cancer.org.uk
Thu May 31 16:57:52 CEST 2007


Dear Wei,

Thanks for pointing out this inconsistency in the documentation.  It is our
understanding that the Illumina software flags beads as outliers if their
intensity is greater than 3 median absolute deviations (MADs) from the
median intensity for that bead type.  We have checked this with some older
format BeadScan output, which marks the outlier status next to the bead
intensity, and the calls line up pretty well (49766 out of 49777 agree,
beadarray calls 9 beads as outliers which Illumina says are not, and
Illumina calls 2 outlier beads which beadarray misses.  The discrepancies
occur for beads close to the (med-3*mad, med+3*mad) boundaries, which would
suggest to me that they are due to rounding.

I have updated the man pages for findAllOutliers(), findBeadStutus() and
createBeadSummaryData() in beadarray 1.5.3, replacing 'means' with 'medians'
in the appropriate places.

Apologies for the inaccuracy.  Best wishes,

Matt

> Dear Mark:
> 
>     I used the function "findAllOutliers()" in the beadarray package to
> detect the outliers in the sample bead level data (BLData) enclosed with
> the beadarray package. However I found this function did not do what it
> is supposed to do. The documentation for this function
> "findAllOutliers()" says the outliers to be removed are those beads
> which are 3 MAD from the *mean* values. But it seems that this function
> removes the outliers which are 3 MAD from the *median* values.
> 
> Below is my R scripts:
> -----------------------
> # first get the list of outliers removed by the function "findAllOutliers"
>> library(beadarray)
>> data(BLData)
>> outlier_indices <- findAllOutliers(BLData, 1)
>> outlier_indices[1:10]
> [1]   19  120  133  305  499  816  894  897  932 1339
> 
> # now let's examine how these outliers were detected
>> x <- BLData[[1]]
>> x$ProbeID[19]
> [1] 2   #outlier with index 19 has the ProbeID 2
> # now let's get all the beads with ProbeID 2
>> bead_type <- x$G[x$ProbeID == 2]
>> bead_type
> [1] 1746.807 1871.963 1548.773 1492.135 1859.391 1723.123 1748.783
> 1657.136 1620.981 2038.145 1977.975 1694.616 1574.794 1973.933 1854.132
> 2152.593
> [17] 1771.693 1673.885 2344.341 1681.198 1591.964
> # detect the outliers which are 3 MAD from the *mean* value, we got 0 hits
>> sum(bead_type > mean(bead_type) + 3*mad(bead_type))
> [1] 0
>> sum(bead_type < mean(bead_type) - 3*mad(bead_type))
> [1] 0
> # detect the outliers which are 3 MAD from the *median* value, we got 1 hit
>> sum(bead_type > median(bead_type) + 3*mad(bead_type))
> [1] 1
>> sum(bead_type < median(bead_type) - 3*mad(bead_type))
> [1] 0
> # this hit gives exactly the same bead as the one detected by the
> function "findAllOutliers"
>> bead_type[bead_type > median(bead_type) + 3*mad(bead_type)]
> [1] 2344.341
>> x$G[19]
> [1] 2344.341
> 
> # now let's examine the outliers detected by the function
> "findAllOutliers" for another bead type
>> x$ProbeID[120]
> [1] 23
>> x$ProbeID[133]
> [1] 23
> # so beads with ProbeID 23 have two outliers detected by the function
> "findAllOutliers"
>> bead_type <- x$G[x$ProbeID == 23]
>> bead_type
>  [1] 1806.182 1590.193 1502.012 1934.863 1149.456 1734.869 1651.342
> 1541.102 1531.266 1735.293 1279.364 1642.195 1512.581 1583.358 1813.992
> 1624.385
> [17] 1585.761 1138.922 1473.952 1467.488 1404.607 1647.807 1577.068
> # 3 outliers were detected if 3 MAD from the *mean* value was used,
> which is greater than the number of outliers detected by the function
> "findAllOutliers"
>> sum(bead_type > mean(bead_type) + 3*mad(bead_type))
> [1] 1
>> sum(bead_type < mean(bead_type) - 3*mad(bead_type))
> [1] 2
> # If we used 3 MAD from the *median* value to detect outliers, we got
> the exactly same results with the function "findAllOutliers"
>> sum(bead_type > median(bead_type) + 3*mad(bead_type))
> [1] 0
>> sum(bead_type < median(bead_type) - 3*mad(bead_type))
> [1] 2
>> bead_type[bead_type < median(bead_type) - 3*mad(bead_type)]
> [1] 1149.456 1138.922
>> x$G[c(120,133)]
> [1] 1149.456 1138.922
> ---------------------------
> 
>     I do not know it is the mistake of the documentation or the mistake
> of the function implementation. I had a look at the Illumina Manuals for
> the outlier removal. They only say Median Absolute Deviation method is
> used to remove the outliers, but they do not say they use mean value or
> median value when removing outliers. It does not seem to make sense to
> use median absolute deviation together with the mean value.
> 
>     I am using R version 2.5.0 and beadarray package version 1.4.0.
> There is same problem with beadarray version 1.2.2. I actually
> identified this problem in the version 1.2.2 and then confirmed this
> problem in the latest version.
> 
> Cheers,
> -------------------------
> Dr. Wei Shi
> Bioinformatics Division
> The Walter and Eliza Hall Institute of Medical Research
> 1G Royal Parade, Parkville, VIC
> Australia



More information about the Bioconductor mailing list