[BioC] problem illumina annotation with lumi
    Md.Mamunur Rashid 
    mamunur.rashid at kcl.ac.uk
       
    Thu Dec 10 16:04:44 CET 2009
    
    
  
Dear List,
I am trying to annotate some illumina microarray probes (humanHT12v3) from an experiment
of 96 samples. Apparently there are some difference between annotating with
illuminaHumanv3BeadID.db and lumiHumanAll.db.
Here is what I have done in brief
1. I have read and processed(also includes detection p.value filtering) the raw data file with lumi package
2. Found some differentially expressed genes using linear model
Now in my topTable I have some thing like that
>  top<- topTable(aneu348_fit2,coef=2,adjust="BH")
>  top
            ID      logFC  AveExpr         t      P.Value  adj.P.Val        B
19287  730612  0.1968519 6.506182  5.446788 1.729911e-06 0.03507526 4.750244
19897 3520463  0.3286017 7.057259  5.390423 2.103278e-06 0.03507526 4.580566
3028  2650605  0.4613558 7.115252  5.309757 2.780147e-06 0.03507526 4.338214
3956  3310538  0.5527499 8.000359  5.113185 5.466881e-06 0.05172900 3.750403
1626  3390605 -0.2277937 6.930935 -4.890353 1.168046e-05 0.07558592 3.089894
25875 6280470  0.5706626 7.235376  4.841339 1.378711e-05 0.07558592 2.945587
34978 6760546  0.3195073 7.659098  4.783197 1.677400e-05 0.07558592 2.774918
32380 3940692 -0.2995773 8.258397 -4.756620 1.834288e-05 0.07558592 2.697098
35264 1740020 -0.3454641 7.384281 -4.734429 1.976252e-05 0.07558592 2.632216
33126 6040398  0.5112817 7.517186  4.731312 1.997039e-05 0.07558592 2.623109
Then, I try to annotate top IDs with geneName, geneSymbol , EntrezId and others.
** As you can see from the result of the topTable my probeIDs are the
array_Address_ID (according to manifest file buy illumina HumanHT-12_v3_0_R2_11283641_A)
>   geneSymbol<- getSYMBOL(, 'illuminaHumanv3BeadID.db')
>   geneName<- sapply(lookUp(aneu348_probeList, 'illuminaHumanv3BeadID.db', 'GENENAME'), function(x) x[1])
  gives me the correct geneName and Symbol. (according to the manifest file)
  But when I try to convert these probeIDs using IlluminaID2nuID() or probeID2nuID() method
  it transforms to a complete different set of geneNames and symbol.
I then added "000" before all of my probes and passed it to IllumimnaID2nuID() function
>  top<- paste("000",top,sep="")
>  illu<- IlluminaID2nuID(top)
Warning messages:
1: In getChipInfo(IlluminaID, lib.mapping = lib.mapping, species = species,  :
   Some input IDs can not be matched!
2: In if (!is.na(chipInfo$IDType)) { :
   the condition has length>  1 and only the first element will be used
>  illu[1,]         # Here illu[1,] holds the mapping for "000730612"
Search_Key        ILMN_Gene        Accession           Symbol
               NA               NA               NA               NA
         Probe_Id Array_Address_Id             nuID
               NA               NA               NA
now for some reason it is always showing "NA" for few of the probes even though when I passed
them individually to the function it returns the correct mapping
>  IlluminaID2nuID(top[1])     # here  top[1] = "000730612"
  Search_Key   ILMN_Gene Accession     Symbol  Probe_Id
000730612 "ILMN_10981" "HTRA1"   "NM_002775.3" "HTRA1" "ILMN_1676563"
           Array_Address_Id nuID
000730612 "000730612"      "ZEObIyCCVRqJSjqHrY"
So my questions are :
1. Why the above functions can not find any entry for few probeIDs even though it's present ?
2. The way around I found out (adding "000" in the beginning) , is it correct or there are
    some other better options ?
3. Even though it's a Human-HT12 chip , the getChipInfo() gives
    getChipInfo(aneu348_N)
    $chipVersion
    [1] "HumanWG6_V2_11223189_B"
    	
4. I am trying to develop a workflow which will handle data with both type of probeID
    pattern(ie. "ILMN_1805" or "730612"). What would be the standard path way to annotate
    both type of data?
Please accept my apology if the mail seems long. I tried to provide as mush details as I could
thanks in advance,
regards,
Mamun
    
    
More information about the Bioconductor
mailing list