[BioC] Possible issue with detection p-values in Lumi package
Jovana Maksimovic
jovana.maksimovic at mcri.edu.au
Wed Jan 5 06:55:46 CET 2011
Dear Bioconductor Users,
I am currently analysing some Illumina expression data (HumanWG-6_V3,GenomeStudio v1.6), and have noticed an issue when comparing data processed using both Lumi and Limma.
I initially processed the data using Limma as described in the Limma User's Guide (p89).
expressed = apply(x$other$Detection < 0.05, 1, any)
y = y[expressed,] ## y now contains only the probes expressed in ALL samples
= ~8500 probes
However with lumi:
raw = lumiR("file.txt")
norm = lumiExpresso(norm,QC.evaluation=TRUE)
x = exprs(norm)
presentCount = detectionCall(x)
y = x[presentCount > 0,]
=~20,000
When I compared the actual detection values in the Limma and Lumi objects, I found that they are actually different. The detection p-values in the limma object are the same as the text file directly from the GenomeStudio output, those values in the lumi object are the result of subtracting the raw values from 1.
Examples of the 3 different detection tables can be seen below.
This appears to be the relevant lumiR code:
if (length(grep("Detection Pval", header, ignore.case = TRUE)) == 0) {
detection <- 1 - detection
}
In the GenomeStudio file that I am working with the detection p-value column is labelled "Detection-4457260019_A" and so the function does not find the "Detection Pval" string and thus performs the conversion of the detection values. I confirmed that this is the case by renaming my detection headings to "Detection Pval-4457260019_A", which resulted in the detection values not being converted and thus remaining equal to the "raw" and Limma detection values.
According to the GenomeStudio v1.0 manual regarding detection p-values:
If the Z score for the probe intensity is smaller than the lowest
negative control Z score, the function returns a 0 and the
p-value is 1.
If the Z score for the probe intensity falls within the range of the Z
scores of the negative controls, R is the rank of the Z score of the
probe, and the p-value is in the range of 0 to 1.
If the Z score for the probe intensity is greater than the largest
negative control Z score, the function returns a 1 and the
p-value is 0.
This suggests that the detection p-value for an expressed probe should be close to 0 in data generated by current releases of GenomeStudio. I know that with some older versions of BeadStudio that the detection value for expressed probes was actually close to 1, and Lumi was built to take this to account; however, I do not see any reason why the detection values for our data should be converted, as they were generated by a relatively new version of GenomeStudio. I propose that Illumina has perhaps changed their column naming system and that this has not been reflected in Lumi as yet. This error can have a significant impact on people's results and I felt it was necessary to bring it to the group's attention.
## Detection p-values as seen in the Limma object
$other
$Detection
4457260019_A 4457260019_B 4457260019_C 4457260019_D 4457260019_E
6450255 0.27536 0.51647 0.06983 0.89065 0.46245
2570615 0.97233 0.97892 0.98682 0.98814 0.98814
6370619 0.89196 0.72727 0.86825 0.96706 0.88669
2600039 0.71014 0.02899 0.39921 0.14361 0.53491
2650615 0.85375 0.60079 0.88274 0.94071 0.40711
4457260019_F 4463361183_A 4463361183_B 4463361183_C 4463361183_D
6450255 0.42161 0.55072 0.65613 0.22398 0.29117
2570615 0.97628 0.77339 0.98155 0.98287 0.98946
6370619 0.84848 0.85507 0.94993 0.98287 0.91963
2600039 0.21476 0.54414 0.46377 0.45982 0.32016
2650615 0.38603 0.92754 0.57312 0.57181 0.64559
4463361183_E 4463361183_F 5511070019_A 5511070019_B 5511070021_A
6450255 0.42951 0.35705 0.25823 0.23979 0.31094
2570615 0.97892 0.99209 0.97760 0.99341 0.95652
6370619 0.77339 0.89855 0.78920 0.75362 0.43478
2600039 0.35968 0.23979 0.17391 0.40975 0.72596
2650615 0.49407 0.16996 0.57312 0.52306 0.46113
5511070021_B 5511070021_C 5511070021_D 5511070021_E 5511070021_F
6450255 0.82213 0.48353 0.37681 0.26482 0.27536
2570615 0.98024 0.97497 0.95784 0.93412 0.96970
6370619 0.48748 0.59947 0.48880 0.50988 0.82213
2600039 0.20422 0.24769 0.49144 0.57049 0.43742
2650615 0.52306 0.37418 0.33202 0.39789 0.60079
49582 more rows ...
## Detection p-values as seen in the Lumi object
> detect[1:5,]
4457260019_A 4457260019_B 4457260019_C 4457260019_D 4457260019_E
6450255 0.72464 0.48353 0.93017 0.10935 0.53755
2570615 0.02767 0.02108 0.01318 0.01186 0.01186
6370619 0.10804 0.27273 0.13175 0.03294 0.11331
2600039 0.28986 0.97101 0.60079 0.85639 0.46509
2650615 0.14625 0.39921 0.11726 0.05929 0.59289
4457260019_F 4463361183_A 4463361183_B 4463361183_C 4463361183_E
6450255 0.57839 0.44928 0.34387 0.77602 0.57049
2570615 0.02372 0.22661 0.01845 0.01713 0.02108
6370619 0.15152 0.14493 0.05007 0.01713 0.22661
2600039 0.78524 0.45586 0.53623 0.54018 0.64032
2650615 0.61397 0.07246 0.42688 0.42819 0.50593
4463361183_F 5511070019_A 5511070021_A 5511070021_B 5511070021_C
6450255 0.64295 0.74177 0.68906 0.17787 0.51647
2570615 0.00791 0.02240 0.04348 0.01976 0.02503
6370619 0.10145 0.21080 0.56522 0.51252 0.40053
2600039 0.76021 0.82609 0.27404 0.79578 0.75231
2650615 0.83004 0.42688 0.53887 0.47694 0.62582
5511070021_D 5511070021_E 5511070021_F
6450255 0.62319 0.73518 0.72464
2570615 0.04216 0.06588 0.03030
6370619 0.51120 0.49012 0.17787
2600039 0.50856 0.42951 0.56258
2650615 0.66798 0.60211 0.39921
## Detection p-values read as a text file from GenomeStudio output
> raw.detect[1:5,]
Detection.4457260019_A Detection.4457260019_B Detection.4457260019_C
6450255 0.27536 0.51647 0.06983
2570615 0.97233 0.97892 0.98682
6370619 0.89196 0.72727 0.86825
2600039 0.71014 0.02899 0.39921
2650615 0.85375 0.60079 0.88274
Detection.4457260019_D Detection.4457260019_E Detection.4457260019_F
6450255 0.89065 0.46245 0.42161
2570615 0.98814 0.98814 0.97628
6370619 0.96706 0.88669 0.84848
2600039 0.14361 0.53491 0.21476
2650615 0.94071 0.40711 0.38603
Detection.4463361183_A Detection.4463361183_B Detection.4463361183_C
6450255 0.55072 0.65613 0.22398
2570615 0.77339 0.98155 0.98287
6370619 0.85507 0.94993 0.98287
2600039 0.54414 0.46377 0.45982
2650615 0.92754 0.57312 0.57181
Detection.4463361183_D Detection.4463361183_E Detection.4463361183_F
6450255 0.29117 0.42951 0.35705
2570615 0.98946 0.97892 0.99209
6370619 0.91963 0.77339 0.89855
2600039 0.32016 0.35968 0.23979
2650615 0.64559 0.49407 0.16996
Detection.5511070019_A Detection.5511070019_B Detection.5511070021_A
6450255 0.25823 0.23979 0.31094
2570615 0.97760 0.99341 0.95652
6370619 0.78920 0.75362 0.43478
2600039 0.17391 0.40975 0.72596
2650615 0.57312 0.52306 0.46113
Detection.5511070021_B Detection.5511070021_C Detection.5511070021_D
6450255 0.82213 0.48353 0.37681
2570615 0.98024 0.97497 0.95784
6370619 0.48748 0.59947 0.48880
2600039 0.20422 0.24769 0.49144
2650615 0.52306 0.37418 0.33202
Detection.5511070021_E Detection.5511070021_F
6450255 0.26482 0.27536
2570615 0.93412 0.96970
6370619 0.50988 0.82213
2600039 0.57049 0.43742
2650615 0.39789 0.60079
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lumi_2.2.1 Biobase_2.10.0 limma_3.6.9
loaded via a namespace (and not attached):
[1] affy_1.28.0 affyio_1.18.0 annotate_1.28.0
[4] AnnotationDbi_1.12.0 DBI_0.2-5 grid_2.12.0
[7] hdrcde_2.15 KernSmooth_2.23-4 lattice_0.19-13
[10] MASS_7.3-8 Matrix_0.999375-44 methylumi_1.6.1
[13] mgcv_1.7-0 nlme_3.1-97 preprocessCore_1.12.0
[16] RSQLite_0.9-2 xtable_1.5-6
Jovana Maksimovic B.Sc (Hons) / B.Binf
Bioinformatics Officer
Bioinformatics, Enabling Facilities
Murdoch Childrens Research Institute
The Royal Children’s Hospital
Flemington Road Parkville Victoria 3052 Australia
E jovana.maksimovic at mcri.edu.au
www.mcri.edu.au
This e-mail and any attachments to it (the "Communication") are, unless otherwise stated, confidential, may contain copyright material and is for the use only of the intended recipient. If you receive the Communication in error, please notify the sender immediately by return e-mail, delete the Communication and the return e-mail, and do not read, copy, retransmit or otherwise deal with it. Any views expressed in the Communication are those of the individual sender only, unless expressly stated to be those of Murdoch Childrens Research Institute (MCRI) ABN 21 006 566 972 or any of its related entities. MCRI does not accept liability in connection with the integrity of or errors in the Communication, computer virus, data corruption, interference or delay arising from or in respect of the Communication. P Please consider the environment before printing this email
More information about the Bioconductor
mailing list