[BioC] limma_3.17.23 - missing ILMN identifiers in EList objects after read.ilmn
Kemal Akat
kakat at mail.rockefeller.edu
Wed Oct 9 20:39:45 CEST 2013
Dear colleagues,
I am currently analyzing a Illumina Mouse v2 bead array dataset using limma and ran across an error I don't quite understand. I came across this error when trying to annotate the differentially expressed genes later on in
the analysis. The problem seems to stem from empty strings in the vector I provide to retrieve the annotation info. But I don't understand how this can happen in the first place.
The probe and control profiles were exported from GenomeStudio without background correction and normalization.
Here is the code I ran:
R> x = read.ilmn(files = "ProbeProfile.txt", ctrlfiles = "ControlProbeProfile.txt", probeid = "Probe_ID", annotation = "TargetID", other.columns = c("Detection", "Avg_NBEADS"), verbose = FALSE)
R> y = neqc(x)
R> expressed = rowSums(y$other$Detection < 0.05) > 4
R> y = y[expressed, ]
R> ids = rownames(y)
R> entrez = unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA))
Error in unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA)) :
error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in FUN(c("ILMN_2735294", "ILMN_2417611", "ILMN_2545897", "ILMN_2762289", :
attempt to use zero-length variable name
Calls: mget ... as.list -> as.list -> .formatList -> lapply -> lapply -> FUN
R> traceback()
1: unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA))
R> ids[ids == ""]
[1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[55] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[109] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[163] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[217] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[271] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[325] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[379] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[433] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[487] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[541] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[595] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[649] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[703] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[757] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[811] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[865] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[919] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
[973] "" ""
So there seem to be 974 empty strings in the row names, but there is nothing like that in the original data file, and in addition this shouldn't be working in R in the first place?
Here is how the EListRaw object looks like after reading it into R.
R> x = read.ilmn(files = "ProbeProfile.txt", ctrlfiles = "ControlProbeProfile.txt", probeid = "Probe_ID", annotation = "TargetID", other.columns = c("Detection", "Avg_NBEADS"), verbose = FALSE)
R> x
An object of class "EListRaw"
$source
[1] "illumina"
$E
9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E
ILMN_2735294 420.8 401.8 395.8 422.9 360.1 358.5 420.7 327.1 178.8 343.4 425.5
ILMN_2417611 323.8 280.2 294.1 315.5 542.5 301.0 398.0 133.7 235.9 382.0 512.7
ILMN_2545897 98.3 109.2 128.0 124.5 111.3 102.6 110.2 106.6 87.2 104.6 101.8
ILMN_2762289 91.7 88.3 94.2 95.5 88.1 81.2 88.5 88.0 79.4 85.3 84.5
ILMN_1248788 87.6 84.7 92.0 92.9 85.9 84.0 93.8 86.9 77.5 84.9 86.3
9379087022_F
ILMN_2735294 322.0
ILMN_2417611 185.7
ILMN_2545897 107.8
ILMN_2762289 88.8
ILMN_1248788 85.1
46250 more rows ...
$genes
TargetID Status
1 0610005A07RIK regular
2 0610005C13RIK regular
3 0610005H09RIK regular
4 0610005I04 regular
5 0610005K03RIK regular
46250 more rows ...
$other
$Detection
9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E
ILMN_2735294 0.00000 0.00000 0.0000 0.0000 0.0000 0.0000 0.00000 0.0000 0.00000 0.00000 0.00000
ILMN_2417611 0.00000 0.00000 0.0000 0.0000 0.0000 0.0000 0.00000 0.0000 0.00000 0.00000 0.00000
ILMN_2545897 0.08974 0.00321 0.0000 0.0000 0.0000 0.0000 0.00107 0.0000 0.00214 0.00214 0.00107
ILMN_2762289 0.34402 0.49359 0.1998 0.1827 0.6068 0.9220 0.71047 0.4776 0.27350 0.58654 0.77991
ILMN_1248788 0.76603 0.86004 0.3472 0.3718 0.8440 0.6645 0.21902 0.6004 0.58120 0.63675 0.53419
9379087022_F
ILMN_2735294 0.0000
ILMN_2417611 0.0000
ILMN_2545897 0.0000
ILMN_2762289 0.3440
ILMN_1248788 0.7949
46250 more rows ...
$Avg_NBEADS
9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E
ILMN_2735294 51 63 58 57 36 46 49 60 62 50 58
ILMN_2417611 44 56 46 51 66 51 42 66 40 47 57
ILMN_2545897 51 69 45 67 47 39 44 56 59 43 50
ILMN_2762289 48 49 53 59 43 55 47 49 54 41 53
ILMN_1248788 43 42 29 38 39 42 36 36 29 31 45
9379087022_F
ILMN_2735294 50
ILMN_2417611 56
ILMN_2545897 58
ILMN_2762289 42
ILMN_1248788 38
46250 more rows ...
Now looking at the end of the file:
R> tail(x$E)
9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E 9379087022_F
92.2 92.6 92.6 93.8 92.1 86.9 91.4 85.7 78.9 86.5 89.0 91.7
89.2 85.7 92.3 89.9 85.9 83.7 91.3 89.5 76.6 91.4 86.3 85.8
89.8 85.5 92.7 92.1 92.7 87.3 90.1 86.2 79.1 83.7 86.4 84.9
96.9 88.9 92.4 94.6 90.7 87.9 96.2 85.6 78.0 82.0 86.4 84.1
87.8 83.5 85.9 90.2 81.6 81.5 92.5 83.8 73.1 80.6 86.1 86.8
89.8 87.4 87.1 89.6 88.1 84.4 91.9 85.7 80.5 88.3 86.8 86.3
R> sessionInfo()
R Under development (unstable) (2013-06-26 r63071)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] splines parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] xtable_1.7-1 vsn_3.29.1 reshape2_1.2.2 ratr_1.0 pheatmap_0.7.4 illuminaMousev2.db_1.18.0
[7] org.Mm.eg.db_2.9.0 GOstats_2.27.1 graph_1.39.3 ggplot2_0.9.3.1 edgeR_3.3.8 limma_3.17.23
[13] codetools_0.2-8 Category_2.27.3 GO.db_2.9.0 RSQLite_0.11.4 DBI_0.2-7 Matrix_1.0-12
[19] lattice_0.20-15 Biostrings_2.29.19 XVector_0.1.4 IRanges_1.19.37 AnnotationDbi_1.23.23 Biobase_2.21.7
[25] BiocGenerics_0.7.5 knitr_1.4.1 setwidth_1.0-3
loaded via a namespace (and not attached):
[1] affy_1.39.2 affyio_1.29.0 annotate_1.39.0 AnnotationForge_1.3.22 BiocInstaller_1.11.4 colorspace_1.2-2 dichromat_2.0-0
[8] digest_0.6.3 evaluate_0.4.7 formatR_0.9 genefilter_1.43.0 grid_3.1.0 GSEABase_1.23.0 gtable_0.1.2
[15] highr_0.2.1 labeling_0.2 MASS_7.3-26 munsell_0.4 plyr_1.8 preprocessCore_1.23.0 proto_0.3-10
[22] RBGL_1.37.2 RColorBrewer_1.0-5 scales_0.2.3 stats4_3.1.0 stringr_0.6.2 survival_2.37-4 tools_3.1.0
[29] XML_3.98-1.1 zlibbioc_1.7.0
R>
Any help and explanations appreciated!
Cheers,
Kemal
--
Kemal Akat
Laboratory of RNA Molecular Biology
The Rockefeller University
1230 York Avenue, Box #186
New York, NY 10065
More information about the Bioconductor
mailing list