[BioC] limma_3.17.23 - missing ILMN identifiers in EList objects after read.ilmn

Wei Shi shi at wehi.EDU.AU
Thu Oct 10 01:10:46 CEST 2013


Dear Kemal,

Those reads with empty names are likely to be control probes because control probes were always put at the end of the data matrix (x in your data) by read.ilmn. These probes however should be removed after you ran neqc function, but this didn't seem to be the case. Could you please run the following command so that I can see if neqc successfully identified the control probes?

table(x$genes$Status)

Best regards,
Wei

On Oct 10, 2013, at 5:39 AM, Kemal Akat wrote:

> Dear colleagues,
> 
> I am currently analyzing a Illumina Mouse v2 bead array dataset using limma and ran across an error I don't quite understand. I came across this error when trying to annotate the differentially expressed genes later on in
> the analysis. The problem seems to stem from empty strings in the vector I provide to retrieve the annotation info. But I don't understand how this can happen in the first place.
> 
> The probe and control profiles were exported from GenomeStudio without background correction and normalization.
> 
> Here is the code I ran:
> 
> R> x = read.ilmn(files = "ProbeProfile.txt", ctrlfiles = "ControlProbeProfile.txt", probeid = "Probe_ID", annotation = "TargetID", other.columns = c("Detection", "Avg_NBEADS"), verbose = FALSE)
> R> y = neqc(x)
> R> expressed = rowSums(y$other$Detection < 0.05) > 4
> R> y = y[expressed, ]
> R> ids = rownames(y)
> R> entrez = unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA))
> 
> Error in unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA)) : 
>  error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in FUN(c("ILMN_2735294", "ILMN_2417611", "ILMN_2545897", "ILMN_2762289",  : 
>  attempt to use zero-length variable name
> Calls: mget ... as.list -> as.list -> .formatList -> lapply -> lapply -> FUN
> 
> R> traceback()
> 1: unlist(mget(ids, illuminaMousev2ENTREZID, ifnotfound = NA))
> 
> R> ids[ids == ""]
>  [1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [55] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [109] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [163] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [217] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [271] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [325] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [379] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [433] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [487] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [541] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [595] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [649] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [703] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [757] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [811] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [865] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [919] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
> [973] "" ""
> 
> So there seem to be 974 empty strings in the row names, but there is nothing like that in the original data file, and in addition this shouldn't be working in R in the first place?
> 
> Here is how the EListRaw object looks like after reading it into R.
> 
> R> x = read.ilmn(files = "ProbeProfile.txt", ctrlfiles = "ControlProbeProfile.txt", probeid = "Probe_ID", annotation = "TargetID", other.columns = c("Detection", "Avg_NBEADS"), verbose = FALSE)
> R> x
> An object of class "EListRaw"
> $source
> [1] "illumina"
> 
> $E
>             9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E
> ILMN_2735294        420.8        401.8        395.8        422.9        360.1        358.5        420.7        327.1        178.8        343.4        425.5
> ILMN_2417611        323.8        280.2        294.1        315.5        542.5        301.0        398.0        133.7        235.9        382.0        512.7
> ILMN_2545897         98.3        109.2        128.0        124.5        111.3        102.6        110.2        106.6         87.2        104.6        101.8
> ILMN_2762289         91.7         88.3         94.2         95.5         88.1         81.2         88.5         88.0         79.4         85.3         84.5
> ILMN_1248788         87.6         84.7         92.0         92.9         85.9         84.0         93.8         86.9         77.5         84.9         86.3
>             9379087022_F
> ILMN_2735294        322.0
> ILMN_2417611        185.7
> ILMN_2545897        107.8
> ILMN_2762289         88.8
> ILMN_1248788         85.1
> 46250 more rows ...
> 
> $genes
>       TargetID  Status
> 1 0610005A07RIK regular
> 2 0610005C13RIK regular
> 3 0610005H09RIK regular
> 4    0610005I04 regular
> 5 0610005K03RIK regular
> 46250 more rows ...
> 
> $other
> $Detection
>             9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E
> ILMN_2735294      0.00000      0.00000       0.0000       0.0000       0.0000       0.0000      0.00000       0.0000      0.00000      0.00000      0.00000
> ILMN_2417611      0.00000      0.00000       0.0000       0.0000       0.0000       0.0000      0.00000       0.0000      0.00000      0.00000      0.00000
> ILMN_2545897      0.08974      0.00321       0.0000       0.0000       0.0000       0.0000      0.00107       0.0000      0.00214      0.00214      0.00107
> ILMN_2762289      0.34402      0.49359       0.1998       0.1827       0.6068       0.9220      0.71047       0.4776      0.27350      0.58654      0.77991
> ILMN_1248788      0.76603      0.86004       0.3472       0.3718       0.8440       0.6645      0.21902       0.6004      0.58120      0.63675      0.53419
>             9379087022_F
> ILMN_2735294       0.0000
> ILMN_2417611       0.0000
> ILMN_2545897       0.0000
> ILMN_2762289       0.3440
> ILMN_1248788       0.7949
> 46250 more rows ...
> 
> $Avg_NBEADS
>             9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E
> ILMN_2735294           51           63           58           57           36           46           49           60           62           50           58
> ILMN_2417611           44           56           46           51           66           51           42           66           40           47           57
> ILMN_2545897           51           69           45           67           47           39           44           56           59           43           50
> ILMN_2762289           48           49           53           59           43           55           47           49           54           41           53
> ILMN_1248788           43           42           29           38           39           42           36           36           29           31           45
>             9379087022_F
> ILMN_2735294           50
> ILMN_2417611           56
> ILMN_2545897           58
> ILMN_2762289           42
> ILMN_1248788           38
> 46250 more rows ...
> 
> Now looking at the end of the file:
> 
> R> tail(x$E)
> 9379087005_A 9379087005_B 9379087022_A 9379087022_B 9379087005_C 9379087005_D 9379087022_C 9379087022_D 9379087005_E 9379087005_F 9379087022_E 9379087022_F
>         92.2         92.6         92.6         93.8         92.1         86.9         91.4         85.7         78.9         86.5         89.0         91.7
>         89.2         85.7         92.3         89.9         85.9         83.7         91.3         89.5         76.6         91.4         86.3         85.8
>         89.8         85.5         92.7         92.1         92.7         87.3         90.1         86.2         79.1         83.7         86.4         84.9
>         96.9         88.9         92.4         94.6         90.7         87.9         96.2         85.6         78.0         82.0         86.4         84.1
>         87.8         83.5         85.9         90.2         81.6         81.5         92.5         83.8         73.1         80.6         86.1         86.8
>         89.8         87.4         87.1         89.6         88.1         84.4         91.9         85.7         80.5         88.3         86.8         86.3
> 
> 
> R> sessionInfo()
> R Under development (unstable) (2013-06-26 r63071)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] splines   parallel  stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] xtable_1.7-1              vsn_3.29.1                reshape2_1.2.2            ratr_1.0                  pheatmap_0.7.4            illuminaMousev2.db_1.18.0
> [7] org.Mm.eg.db_2.9.0        GOstats_2.27.1            graph_1.39.3              ggplot2_0.9.3.1           edgeR_3.3.8               limma_3.17.23            
> [13] codetools_0.2-8           Category_2.27.3           GO.db_2.9.0               RSQLite_0.11.4            DBI_0.2-7                 Matrix_1.0-12            
> [19] lattice_0.20-15           Biostrings_2.29.19        XVector_0.1.4             IRanges_1.19.37           AnnotationDbi_1.23.23     Biobase_2.21.7           
> [25] BiocGenerics_0.7.5        knitr_1.4.1               setwidth_1.0-3           
> 
> loaded via a namespace (and not attached):
> [1] affy_1.39.2            affyio_1.29.0          annotate_1.39.0        AnnotationForge_1.3.22 BiocInstaller_1.11.4   colorspace_1.2-2       dichromat_2.0-0       
> [8] digest_0.6.3           evaluate_0.4.7         formatR_0.9            genefilter_1.43.0      grid_3.1.0             GSEABase_1.23.0        gtable_0.1.2          
> [15] highr_0.2.1            labeling_0.2           MASS_7.3-26            munsell_0.4            plyr_1.8               preprocessCore_1.23.0  proto_0.3-10          
> [22] RBGL_1.37.2            RColorBrewer_1.0-5     scales_0.2.3           stats4_3.1.0           stringr_0.6.2          survival_2.37-4        tools_3.1.0           
> [29] XML_3.98-1.1           zlibbioc_1.7.0        
> R> 
> 
> Any help and explanations appreciated!
> 
> Cheers,
> Kemal
> --
> Kemal Akat
> Laboratory of RNA Molecular Biology
> The Rockefeller University
> 1230 York Avenue, Box #186
> New York, NY 10065
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioconductor mailing list