[BioC] NA geneSymbol with lumi

Sebastien Gerega seb at gerega.net
Fri Nov 16 01:11:12 CET 2007


Sebastien Gerega <seb at ...> writes:

> 
> Hi,
> I am using the lumi package to analyse illumina microarray data.
> When it finally comes to getting the top 10 DE genes with topTable I get 
> many hits with
> the geneSymbol <NA>. However, if I look up the ProbeID corresponding to 
> the nuID
> that provide <NA>, I find that they do correspond to genes. Why aren't 
> they being
> displayed in the topTable?
> thanks,
> Sebastien
> 
>                       ID geneSymbol     logFC         t      P.Value    
> adj.P.Val        B
> 1917  fwfUovXT3rjAjqbpJU     S100A8 -5.307223 -50.43759 9.854174e-09 
> 0.0001383625 8.724832
> 12632 Qd_S7V4OkLjsX3jkt4      KRT6B -5.281406 -39.54237 3.896317e-08 
> 0.0002735409 8.229157
> 12149 BjSTT6BOqGLhpKKFGI       <NA> -3.118669 -30.01505 1.844180e-07 
> 0.0008631377 7.451766
> 7474  6ipCUUDxcp4ryIj6Uk       <NA> -3.155916 -24.45685 5.835502e-07 
> 0.0013366890 6.716048
> 3831  3nivfFfvk55Rd18lLk       <NA> -2.690362 -24.10891 6.324511e-07 
> 0.0013366890 6.659617
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 


I have looked into this problem a little more...

I downloaded the Human6_v2_sequence spreadsheet from the Illumina
website and found that many of the targets that provide NA as
gene symbol have no symbol in the Illumina database either.

For example:

	ID	geneSymbol
5903	ILMN_21212	FAM43A
3103	ILMN_1425	FOXO4
11993	ILMN_6504	PPL
5153	ILMN_19390	ST3GAL4
1723	ILMN_12716	CREB3L2
4484	ILMN_17676	TNS3
2700	ILMN_138461	<NA>
1358	ILMN_12133	FSCN1
3507	ILMN_15271	CITED4
12401	ILMN_73087	<NA>

ILMN_73087 provides NA as gene symbol and does not have a gene
symbol in the Illumina DB either.

However, ILMN_138461 provides NA as gene symbol but does have a
gene symbol in the Illumina DB. It is APM-1.

In addition ILMN_73087 has no entries in either the
Illumina or BioC DB but when I do a search for ILMN_73087 in
Ensembl I a hit that has multiple EntrezGene listings.

Is there any fix for the NA entries? Is this problem being addressed?
thanks,
Sebastien



More information about the Bioconductor mailing list