[BioC] Possible issue with detection p-values in Lumi package

Wei Shi shi at wehi.EDU.AU
Wed Jan 5 23:53:24 CET 2011


Dear Jovana:

	Thanks for the very detailed report for the problems you have encountered with the processing of your beadchip data.

	Detection/Detection Pval outputted from Illumina BeadStudio/GenoStudio is always confusing (different versions of BeadStudio could give detection values in opposite directions). For the Illumina dataset used in the Limma User's Guide, an expressed probe has a small detection value (close to 0). This is the reason why probes with detection value less than 0.05 were selected as the expressed probes.

	 You should always check the direction of detection values in order to filter out non-expressed probes correctly. You can use one of the arrays in your data to check this. Detection values of probes which have the largest intensities (or the smallest intensities) in your array should tell you the direction.

	BTW, the command "expressed = apply(x$other$Detection < 0.05, 1, any)" tells you which probes express in at least ONE array (if expressed probes have detection values close to 0). It does not give you the probes which express in ALL arrays. The purpose of probe filtering is to remove those probes which do not express in any of the arrays so as to improve the power to detect differentially expressed genes.

	Hope this helps.

Cheers,
Wei

On Jan 5, 2011, at 4:55 PM, Jovana Maksimovic wrote:

> Dear Bioconductor Users,
> I am currently analysing some Illumina expression data (HumanWG-6_V3,GenomeStudio v1.6), and have noticed an issue when comparing data processed using both Lumi and Limma.
> I initially processed the data using Limma as described in the Limma User's Guide (p89). 
> 
> expressed = apply(x$other$Detection < 0.05, 1, any)
> y = y[expressed,] ## y now contains only the probes expressed in ALL samples
> = ~8500 probes
> 
> However with lumi:
> 
> raw = lumiR("file.txt")
> norm = lumiExpresso(norm,QC.evaluation=TRUE)
> x = exprs(norm)
> presentCount = detectionCall(x)
> y = x[presentCount > 0,]
> =~20,000
> 
> When I compared the actual detection values in the Limma and Lumi objects, I found that they are actually different. The detection p-values in the limma object are the same as the text file directly from the GenomeStudio output, those values in the lumi object are the result of subtracting the raw values from 1.
> Examples of the 3 different detection tables can be seen below.
> This appears to be the relevant lumiR code:
> 
> if (length(grep("Detection Pval", header, ignore.case = TRUE)) == 0) {
>            detection <- 1 - detection
> }
> 
> In the GenomeStudio file that I am working with the detection p-value column is labelled "Detection-4457260019_A" and so the function does not find the "Detection Pval" string and thus performs the conversion of the detection values. I confirmed that this is the case by renaming my detection headings to "Detection Pval-4457260019_A", which resulted in the detection values not being converted and thus remaining equal to the "raw" and Limma detection values.
> 
> According to the GenomeStudio v1.0 manual regarding detection p-values:
> 
> If the Z score for the probe intensity is smaller than the lowest
> negative control Z score, the function returns a 0 and the
> p-value is 1.
> If the Z score for the probe intensity falls within the range of the Z
> scores of the negative controls, R is the rank of the Z score of the
> probe, and the p-value is in the range of 0 to 1.
> If the Z score for the probe intensity is greater than the largest
> negative control Z score, the function returns a 1 and the
> p-value is 0.
> 
> This suggests that the detection p-value for an expressed probe should be close to 0 in data generated by current releases of GenomeStudio. I know that with some older versions of BeadStudio that the detection value for expressed probes was actually close to 1, and Lumi was built to take this to account; however, I do not see any reason why the detection values for our data should be converted, as they were generated by a relatively new version of GenomeStudio. I propose that Illumina has perhaps changed their column naming system and that this has not been reflected in Lumi as yet. This error can have a significant impact on people's results and I felt it was necessary to bring it to the group's attention.  
> 
> 
> ## Detection p-values as seen in the Limma object
> $other
> $Detection
>        4457260019_A 4457260019_B 4457260019_C 4457260019_D 4457260019_E
> 6450255      0.27536      0.51647      0.06983      0.89065      0.46245
> 2570615      0.97233      0.97892      0.98682      0.98814      0.98814
> 6370619      0.89196      0.72727      0.86825      0.96706      0.88669
> 2600039      0.71014      0.02899      0.39921      0.14361      0.53491
> 2650615      0.85375      0.60079      0.88274      0.94071      0.40711
>        4457260019_F 4463361183_A 4463361183_B 4463361183_C 4463361183_D
> 6450255      0.42161      0.55072      0.65613      0.22398      0.29117
> 2570615      0.97628      0.77339      0.98155      0.98287      0.98946
> 6370619      0.84848      0.85507      0.94993      0.98287      0.91963
> 2600039      0.21476      0.54414      0.46377      0.45982      0.32016
> 2650615      0.38603      0.92754      0.57312      0.57181      0.64559
>        4463361183_E 4463361183_F 5511070019_A 5511070019_B 5511070021_A
> 6450255      0.42951      0.35705      0.25823      0.23979      0.31094
> 2570615      0.97892      0.99209      0.97760      0.99341      0.95652
> 6370619      0.77339      0.89855      0.78920      0.75362      0.43478
> 2600039      0.35968      0.23979      0.17391      0.40975      0.72596
> 2650615      0.49407      0.16996      0.57312      0.52306      0.46113
>        5511070021_B 5511070021_C 5511070021_D 5511070021_E 5511070021_F
> 6450255      0.82213      0.48353      0.37681      0.26482      0.27536
> 2570615      0.98024      0.97497      0.95784      0.93412      0.96970
> 6370619      0.48748      0.59947      0.48880      0.50988      0.82213
> 2600039      0.20422      0.24769      0.49144      0.57049      0.43742
> 2650615      0.52306      0.37418      0.33202      0.39789      0.60079
> 49582 more rows ...
> 
> ## Detection p-values as seen in the Lumi object
>> detect[1:5,]
>        4457260019_A 4457260019_B 4457260019_C 4457260019_D 4457260019_E
> 6450255      0.72464      0.48353      0.93017      0.10935      0.53755
> 2570615      0.02767      0.02108      0.01318      0.01186      0.01186
> 6370619      0.10804      0.27273      0.13175      0.03294      0.11331
> 2600039      0.28986      0.97101      0.60079      0.85639      0.46509
> 2650615      0.14625      0.39921      0.11726      0.05929      0.59289
>        4457260019_F 4463361183_A 4463361183_B 4463361183_C 4463361183_E
> 6450255      0.57839      0.44928      0.34387      0.77602      0.57049
> 2570615      0.02372      0.22661      0.01845      0.01713      0.02108
> 6370619      0.15152      0.14493      0.05007      0.01713      0.22661
> 2600039      0.78524      0.45586      0.53623      0.54018      0.64032
> 2650615      0.61397      0.07246      0.42688      0.42819      0.50593
>        4463361183_F 5511070019_A 5511070021_A 5511070021_B 5511070021_C
> 6450255      0.64295      0.74177      0.68906      0.17787      0.51647
> 2570615      0.00791      0.02240      0.04348      0.01976      0.02503
> 6370619      0.10145      0.21080      0.56522      0.51252      0.40053
> 2600039      0.76021      0.82609      0.27404      0.79578      0.75231
> 2650615      0.83004      0.42688      0.53887      0.47694      0.62582
>        5511070021_D 5511070021_E 5511070021_F
> 6450255      0.62319      0.73518      0.72464
> 2570615      0.04216      0.06588      0.03030
> 6370619      0.51120      0.49012      0.17787
> 2600039      0.50856      0.42951      0.56258
> 2650615      0.66798      0.60211      0.39921
> 
> ## Detection p-values read as a text file from GenomeStudio output
>> raw.detect[1:5,]
>        Detection.4457260019_A Detection.4457260019_B Detection.4457260019_C
> 6450255                0.27536                0.51647                0.06983
> 2570615                0.97233                0.97892                0.98682
> 6370619                0.89196                0.72727                0.86825
> 2600039                0.71014                0.02899                0.39921
> 2650615                0.85375                0.60079                0.88274
>        Detection.4457260019_D Detection.4457260019_E Detection.4457260019_F
> 6450255                0.89065                0.46245                0.42161
> 2570615                0.98814                0.98814                0.97628
> 6370619                0.96706                0.88669                0.84848
> 2600039                0.14361                0.53491                0.21476
> 2650615                0.94071                0.40711                0.38603
>        Detection.4463361183_A Detection.4463361183_B Detection.4463361183_C
> 6450255                0.55072                0.65613                0.22398
> 2570615                0.77339                0.98155                0.98287
> 6370619                0.85507                0.94993                0.98287
> 2600039                0.54414                0.46377                0.45982
> 2650615                0.92754                0.57312                0.57181
>        Detection.4463361183_D Detection.4463361183_E Detection.4463361183_F
> 6450255                0.29117                0.42951                0.35705
> 2570615                0.98946                0.97892                0.99209
> 6370619                0.91963                0.77339                0.89855
> 2600039                0.32016                0.35968                0.23979
> 2650615                0.64559                0.49407                0.16996
>        Detection.5511070019_A Detection.5511070019_B Detection.5511070021_A
> 6450255                0.25823                0.23979                0.31094
> 2570615                0.97760                0.99341                0.95652
> 6370619                0.78920                0.75362                0.43478
> 2600039                0.17391                0.40975                0.72596
> 2650615                0.57312                0.52306                0.46113
>        Detection.5511070021_B Detection.5511070021_C Detection.5511070021_D
> 6450255                0.82213                0.48353                0.37681
> 2570615                0.98024                0.97497                0.95784
> 6370619                0.48748                0.59947                0.48880
> 2600039                0.20422                0.24769                0.49144
> 2650615                0.52306                0.37418                0.33202
>        Detection.5511070021_E Detection.5511070021_F
> 6450255                0.26482                0.27536
> 2570615                0.93412                0.96970
> 6370619                0.50988                0.82213
> 2600039                0.57049                0.43742
> 2650615                0.39789                0.60079
> 
>> sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] lumi_2.2.1     Biobase_2.10.0 limma_3.6.9
> 
> loaded via a namespace (and not attached):
> [1] affy_1.28.0           affyio_1.18.0         annotate_1.28.0
> [4] AnnotationDbi_1.12.0  DBI_0.2-5             grid_2.12.0
> [7] hdrcde_2.15           KernSmooth_2.23-4     lattice_0.19-13
> [10] MASS_7.3-8            Matrix_0.999375-44    methylumi_1.6.1
> [13] mgcv_1.7-0            nlme_3.1-97           preprocessCore_1.12.0
> [16] RSQLite_0.9-2         xtable_1.5-6
> 
> 
> Jovana Maksimovic B.Sc (Hons) / B.Binf
> Bioinformatics Officer
> Bioinformatics, Enabling Facilities
> 
> Murdoch Childrens Research Institute
> The Royal Children’s Hospital
> Flemington Road Parkville Victoria 3052 Australia 
> E jovana.maksimovic at mcri.edu.au
> www.mcri.edu.au
> 
> This e-mail and any attachments to it (the "Communication") are, unless otherwise stated, confidential, may contain copyright material and is for the use only of the intended recipient. If you receive the Communication in error, please notify the sender immediately by return e-mail, delete the Communication and the return e-mail, and do not read, copy, retransmit or otherwise deal with it. Any views expressed in the Communication are those of the individual sender only, unless expressly stated to be those of Murdoch Childrens Research Institute (MCRI) ABN 21 006 566 972 or any of its related entities. MCRI does not accept liability in connection with the integrity of or errors in the Communication, computer virus, data corruption, interference or delay arising from or in respect of the Communication. P      Please consider the environment before printing this email
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioconductor mailing list