[BioC] limma_3.17.23 - missing ILMN identifiers in EList objects after read.ilmn
Wei Shi
shi at wehi.EDU.AU
Thu Oct 10 01:10:46 CEST 2013
Dear Kemal,
Those reads with empty names are likely to be control probes because control probes were always put at the end of the data matrix (x in your data) by read.ilmn. These probes however should be removed after you ran neqc function, but this didn't seem to be the case. Could you please run the following command so that I can see if neqc successfully identified the control probes?
table(x$genes$Status)
Best regards,
Wei
On Oct 10, 2013, at 5:39 AM, Kemal Akat wrote:
> Dear colleagues,
>
> I am currently analyzing a Illumina Mouse v2 bead array dataset using limma and ran across an error I don't quite understand. I came across this error when trying to annotate the differentially expressed genes later on in
> the analysis. The problem seems to stem from empty strings in the vector I provide to retrieve the annotation info. But I don't understand how this can happen in the first place.
>
> The probe and control profiles were exported from GenomeStudio without background correction and normalization.
>
> Here is the code I ran:
>
> R> x = read.ilmn(files = "ProbeProfile.txt", ctrlfiles = "ControlProbeProfile.txt", probeid = "Probe_ID", annotation = "TargetID", other.columns = c("Detection", "Avg_NBEADS"), verbose = FALSE)
> R> y = neqc(x)
> R> expressed = rowSums(y$other$Detection < 0.05) > 4
> R> y = y[expressed, ]
> R> ids = rownames(y)
> R> entrez = unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA))
>
> Error in unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA)) :
> error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in FUN(c("ILMN_2735294", "ILMN_2417611", "ILMN_2545897", "ILMN_2762289", :
> attempt to use zero-length variable name
> Calls: mget ... as.list -> as.list -> .formatList -> lapply -> lapply -> FUN
>
> R> traceback()
> 1: unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA))
>
> R> ids[ids == ""]
> [1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [55] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [109] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [163] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [217] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [271] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [325] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [379] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [433] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [487] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [541] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [595] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [649] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [703] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [757] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [811] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [865] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [919] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [973] "" ""
>
> So there seem to be 974 empty strings in the row names, but there is nothing like that in the original data file, and in addition this shouldn't be working in R in the first place?
>
> Here is how the EListRaw object looks like after reading it into R.
>
> R> x = read.ilmn(files = "ProbeProfile.txt", ctrlfiles = "ControlProbeProfile.txt", probeid = "Probe_ID", annotation = "TargetID", other.columns = c("Detection", "Avg_NBEADS"), verbose = FALSE)
> R> x
> An object of class "EListRaw"
> $source
> [1] "illumina"
>
> $E
> 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E
> ILMN_2735294 420.8 401.8 395.8 422.9 360.1 358.5 420.7 327.1 178.8 343.4 425.5
> ILMN_2417611 323.8 280.2 294.1 315.5 542.5 301.0 398.0 133.7 235.9 382.0 512.7
> ILMN_2545897 98.3 109.2 128.0 124.5 111.3 102.6 110.2 106.6 87.2 104.6 101.8
> ILMN_2762289 91.7 88.3 94.2 95.5 88.1 81.2 88.5 88.0 79.4 85.3 84.5
> ILMN_1248788 87.6 84.7 92.0 92.9 85.9 84.0 93.8 86.9 77.5 84.9 86.3
> 9379087022_F
> ILMN_2735294 322.0
> ILMN_2417611 185.7
> ILMN_2545897 107.8
> ILMN_2762289 88.8
> ILMN_1248788 85.1
> 46250 more rows ...
>
> $genes
> TargetID Status
> 1 0610005A07RIK regular
> 2 0610005C13RIK regular
> 3 0610005H09RIK regular
> 4 0610005I04 regular
> 5 0610005K03RIK regular
> 46250 more rows ...
>
> $other
> $Detection
> 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E
> ILMN_2735294 0.00000 0.00000 0.0000 0.0000 0.0000 0.0000 0.00000 0.0000 0.00000 0.00000 0.00000
> ILMN_2417611 0.00000 0.00000 0.0000 0.0000 0.0000 0.0000 0.00000 0.0000 0.00000 0.00000 0.00000
> ILMN_2545897 0.08974 0.00321 0.0000 0.0000 0.0000 0.0000 0.00107 0.0000 0.00214 0.00214 0.00107
> ILMN_2762289 0.34402 0.49359 0.1998 0.1827 0.6068 0.9220 0.71047 0.4776 0.27350 0.58654 0.77991
> ILMN_1248788 0.76603 0.86004 0.3472 0.3718 0.8440 0.6645 0.21902 0.6004 0.58120 0.63675 0.53419
> 9379087022_F
> ILMN_2735294 0.0000
> ILMN_2417611 0.0000
> ILMN_2545897 0.0000
> ILMN_2762289 0.3440
> ILMN_1248788 0.7949
> 46250 more rows ...
>
> $Avg_NBEADS
> 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E
> ILMN_2735294 51 63 58 57 36 46 49 60 62 50 58
> ILMN_2417611 44 56 46 51 66 51 42 66 40 47 57
> ILMN_2545897 51 69 45 67 47 39 44 56 59 43 50
> ILMN_2762289 48 49 53 59 43 55 47 49 54 41 53
> ILMN_1248788 43 42 29 38 39 42 36 36 29 31 45
> 9379087022_F
> ILMN_2735294 50
> ILMN_2417611 56
> ILMN_2545897 58
> ILMN_2762289 42
> ILMN_1248788 38
> 46250 more rows ...
>
> Now looking at the end of the file:
>
> R> tail(x$E)
> 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E 9379087022_F
> 92.2 92.6 92.6 93.8 92.1 86.9 91.4 85.7 78.9 86.5 89.0 91.7
> 89.2 85.7 92.3 89.9 85.9 83.7 91.3 89.5 76.6 91.4 86.3 85.8
> 89.8 85.5 92.7 92.1 92.7 87.3 90.1 86.2 79.1 83.7 86.4 84.9
> 96.9 88.9 92.4 94.6 90.7 87.9 96.2 85.6 78.0 82.0 86.4 84.1
> 87.8 83.5 85.9 90.2 81.6 81.5 92.5 83.8 73.1 80.6 86.1 86.8
> 89.8 87.4 87.1 89.6 88.1 84.4 91.9 85.7 80.5 88.3 86.8 86.3
>
>
> R> sessionInfo()
> R Under development (unstable) (2013-06-26 r63071)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] splines parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] xtable_1.7-1 vsn_3.29.1 reshape2_1.2.2 ratr_1.0 pheatmap_0.7.4 illuminaMousev2.db_1.18.0
> [7] org.Mm.eg.db_2.9.0 GOstats_2.27.1 graph_1.39.3 ggplot2_0.9.3.1 edgeR_3.3.8 limma_3.17.23
> [13] codetools_0.2-8 Category_2.27.3 GO.db_2.9.0 RSQLite_0.11.4 DBI_0.2-7 Matrix_1.0-12
> [19] lattice_0.20-15 Biostrings_2.29.19 XVector_0.1.4 IRanges_1.19.37 AnnotationDbi_1.23.23 Biobase_2.21.7
> [25] BiocGenerics_0.7.5 knitr_1.4.1 setwidth_1.0-3
>
> loaded via a namespace (and not attached):
> [1] affy_1.39.2 affyio_1.29.0 annotate_1.39.0 AnnotationForge_1.3.22 BiocInstaller_1.11.4 colorspace_1.2-2 dichromat_2.0-0
> [8] digest_0.6.3 evaluate_0.4.7 formatR_0.9 genefilter_1.43.0 grid_3.1.0 GSEABase_1.23.0 gtable_0.1.2
> [15] highr_0.2.1 labeling_0.2 MASS_7.3-26 munsell_0.4 plyr_1.8 preprocessCore_1.23.0 proto_0.3-10
> [22] RBGL_1.37.2 RColorBrewer_1.0-5 scales_0.2.3 stats4_3.1.0 stringr_0.6.2 survival_2.37-4 tools_3.1.0
> [29] XML_3.98-1.1 zlibbioc_1.7.0
> R>
>
> Any help and explanations appreciated!
>
> Cheers,
> Kemal
> --
> Kemal Akat
> Laboratory of RNA Molecular Biology
> The Rockefeller University
> 1230 York Avenue, Box #186
> New York, NY 10065
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}
More information about the Bioconductor
mailing list