[BioC] Illumina annotation packages discrepancy

Tue Dec 2 08:51:29 CET 2008

You'll want to use the illuminaHumanv2ProbeID.db package.
Lynn

Renaud Gaujoux wrote:
> Oups... I'm really sorry Mark for the confusion. I think misread the 
> vignette.
>
> I BLASTed some of the missing probes and some of them gave quite 
> convincing results (100% identity but with different variants), others 
> didn't return any sequence. So I'll try with the package from 2.2.
>
> Thanks again,
> Renaud
>
> Lynn Amon wrote:
>> The illuminaHumanv2.db package is not a "proprietary" package.  It is 
>> currently maintained by Mark Dunning (Mark.Dunning at cancer.org.uk).  
>> It is based on BLASTed sequences but there was a problem in creating 
>> the package when more than one accession was assigned to a probe 
>> which caused the annotation program to skip all those probes which is 
>> why you are finding so many without annotation.  You should contact 
>> Mark to find out if that problem was corrected and a new version 
>> released.  You could also try using 2.2 release which I created and 
>> has annotation for all those probes.
>> Lynn
>>
>>
>> Renaud Gaujoux wrote:
>>> Hi Pan,
>>>
>>> thanks for your answer. I've been (and still am) struggling a bit to 
>>> get consistent and up to date annotation for my data.
>>>
>>> So, I guess it is more reliable to use the lumiHumanAll.db package?
>>>
>>> However, what about the probes that are note annotated in 
>>> lumiHumanAll but look like interesting for my study (i.e. appearing 
>>> in my top lists for differential expression or classification power).
>>> I've got such probes that are annotated neither packages 
>>> lumiHumanAll.db nor in lumiHumanV2 but are in illuminaHumanv2.
>>>
>>> Hence no package give me consistent annotation for my top genes. 
>>> However I've got an annotation file (that came with the array data, 
>>> I guess output by BeadStudio) that gives me annotations for all of 
>>> my probes. But as you mentioned, these might be outdated, which 
>>> actually bothers me. Any suggestion about that?
>>>
>>> By the way, how come that even Illumina "proprietary" packages 
>>> (illuminaHumanv2.db) don't annotate correctly their own probes? :(
>>>
>>> Thanks again for your help and clarification, and the lumi package.
>>>
>>> Renaud
>>>
>>>
>>> Pan Du wrote:
>>>> Hi Renaud,
>>>>
>>>> The reason of discrepancy is due to the different mapping criteria. 
>>>> Both
>>>> "lumiHumanAll.db" and "illuminaHumanv2.db" libraries are based on 
>>>> Blasting
>>>> result of RefSeq database. The "lumiHumanAll.db" library is nuID 
>>>> indexed and
>>>> includes all the probes of different versions. For the mapping from 
>>>> probe to
>>>> RefSeq, it defined both sensitivity and specificity (see the vignette
>>>> "IlluminaAnnotation.Rnw" in the lumi package). As a result, it 
>>>> might include
>>>> less mapping than "illuminaHumanv2.db" because "lumiHumanAll.db" 
>>>> filtered
>>>> out some dubious mappings (e.g., one probe has multiple perfect 
>>>> mapping.)
>>>>
>>>> The "lumiHumanV2" library was built based on the original 
>>>> annotation by
>>>> Illumina company. As a result, it has much more probe mappings. 
>>>> However,
>>>> many mappings might be outdated because of the updates of the genome
>>>> annotation.
>>>>
>>>> Hope this will clarify the confusion.
>>>>
>>>>
>>>> Pan
>>>>
>>>>
>>>> On 11/28/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch"
>>>> <bioconductor-request at stat.math.ethz.ch> wrote:
>>>>
>>>>  
>>>>> Date: Thu, 27 Nov 2008 16:03:36 +0200
>>>>> From: Renaud Gaujoux <renaud at mancala.cbio.uct.ac.za>
>>>>> Subject: [BioC] Illumina annotation packages discrepancy
>>>>> To: bioconductor at stat.math.ethz.ch
>>>>> Message-ID: <492EA8B8.5000400 at cbio.uct.ac.za>
>>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>>>
>>>>> Hi list,
>>>>>
>>>>> I've got BeadSummary data from Illumina (Array content:
>>>>> HUMANREF-8_V2_11223162_B.XML.xml).
>>>>> I imported it in R using the function lumi.batch.
>>>>> This automatically computed the nuID for each probe and set the
>>>>> annotation package to lumiHumanAll.db.
>>>>> This is all good.
>>>>>
>>>>> BUT, when I do
>>>>>
>>>>> lookUp(nuIDs, 'lumiHumanAll.db', 'GENENAME')
>>>>>
>>>>> I get 2921out of 20589 probes with NA.
>>>>>
>>>>> If I do the same using the old annotation package lumiHumanV2:
>>>>>
>>>>> lookUp(nuIDs, 'lumiHumanV2', 'GENENAME')
>>>>>
>>>>> I get 454 out of 20589 probes with NA.
>>>>>
>>>>> Finally, if I do the same using the annotation package
>>>>> illuminaHumanv2.db (but based on the corresponding TargetIDs):
>>>>>
>>>>> lookUp(targetIDs, 'illuminaHumanv2.db', 'GENENAME')
>>>>>
>>>>> I get 2041out of 20589 probes with NA.
>>>>>
>>>>> Can anybody give me an explanation for that discrepancy? And what
>>>>> annotation package I should use as it looks like some interesting 
>>>>> probes
>>>>> (for my experiment) don't have annotation in the new version?
>>>>>
>>>>> Also I could not find any reference to that HUMANREF-8_V2_11223162_B
>>>>> annotation (neither on Illumina website nor in Bioconductor 
>>>>> packages). I
>>>>> only found information about HUMANREF-8_V2_11223162_A. Is the letter
>>>>> suffix (A or B) really important?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>     
>>>>
>>>>
>>>> ------------------------------------------------------
>>>> Pan Du, PhD
>>>> Research Assistant Professor
>>>> Northwestern University Biomedical Informatics Center
>>>> 750 N. Lake Shore Drive, 11-176
>>>> Chicago, IL  60611
>>>> Office (312) 503-2360; Fax: (312) 503-5388
>>>> dupan (at) northwestern.edu
>>>> ------------------------------------------------------
>>>>  
>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>