[BioC] [Fwd: Reporting problem with annotation in biomaRt, Illumina arrays - correction]

Pan Du dupan at northwestern.edu
Wed Feb 18 18:27:58 CET 2009


Hi Nenad and Marc,

Sorry for missing this discussion.

What kept in the lumiHumanIDMapping.db package is basically the Illumina
manifest files of different Illumina chips. If the problem exists, then it
is the problem of Illumina manifest files. No relation with the package
itself.

Illumina changed their IDs for several times and they are not compatible
with each other. I am not sure what type Illumina ID is used in biomaRt. For
the early version (verion 1) of the Illumina Chips, their probe Ids are pure
numbers. Later on they changed the IDs in the form of "ILMN_xxx". Illumina
also provided Gene IDs (previous called Target IDs). All of these caused
lots of confusing and difficulty in combining data. That's the reason we
invented nuID (which is based probe sequence and is globally unique) to
avoid all of these problems.

The lumiHumanIDMapping.db is provided for the convenience to conversion
between different types of IDs. Mainly designed for the conversion between
nuIDs and Illumina IDs. Users can also use them for conversion between
Illumina IDs by writing simply script by themselves.



Pan


On 2/18/09 10:33 AM, "Marc Carlson" <mcarlson at fhcrc.org> wrote:

> From: Nenad Bartonicek <nbartonicek at gmail.com>
> Date: Wed, 18 Feb 2009 09:33:46 +0000
> To: <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] Reporting problem with annotation in biomaRt, Illumina arrays
> - correction
> 
> Dear all,
> 
> My apologies, the annotation problem was not with biomaRt, but with
> the prepackaged datasets:
> 
> 1. lumiHumanIDMapping.db and
> 2. lumiMouseIDMapping.db.
> 
> The description of the problem remains the same, though.
> 
> Gioulietta and Wolfgang, thank you for the prompt reply.
> 
> Regards,
> 
> Nenad
> 
> p.s. The missing sessionInfo():
> 
> R version 2.8.0 (2008-10-20)
> i386-apple-darwin8.11.1
> 
> locale:
> en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
> 
> attached base packages:
> [1] stats     graphics  grDevices datasets  tools     utils     methods
> [8] base
> 
> other attached packages:
>   [1] lumiMouseIDMapping.db_1.0.0 lumiHumanIDMapping.db_1.0.0
>   [3] lumi_1.8.3                  RSQLite_0.7-1
>   [5] preprocessCore_1.4.0        mgcv_1.4-1.1
>   [7] affy_1.20.2                 annotate_1.20.1
>   [9] xtable_1.5-4                AnnotationDbi_1.4.2
> [11] RMySQL_0.7-2                DBI_0.2-4
> [13] biomaRt_1.16.0              R.utils_1.1.1
> [15] R.oo_1.4.6                  R.methodsS3_1.0.3
> [17] Biobase_2.2.1
> 
> loaded via a namespace (and not attached):
> [1] RCurl_0.94-0  XML_1.99-0    affyio_1.10.1
> 
> 
> 
> 
>>> Hi Nenad,
>>> 
>>> I had a look at our BioMart interface- the web interface at:
>>> 
>>> www.ensembl.org/biomart/martview
>>> 
>>> It appears to me that the Illumina V1 probe set attribute for mouse
>>> gives the correct probe names.  I believe the Illumina V1 set for
>>> mouse
>>> is the same as MouseWG6_V1.
>>> 
>>> If you give me more details (i.e. which genes you are looking at, or
>>> which filters you applied) I can give this another try.  At first
>>> glance, it doesn't look like an Ensembl data problem.
>>> 
>>> Regards,
>>> Giulietta (Ensembl Helpdesk)
>>> 
>>> 
>>> On Tue Feb 17 13:35:57 2009, huber at ebi.ac.uk wrote:
>>>> Hi Nenad
>>>> 
>>>> thank you for reporting this!
>>>> 
>>>> Since your question raises a more general operational question with
>>>> biomaRt, I'd like to use the opportunity to explain, to this list,
>>>> what's going on (not quite) behind the scenes. There are three
>>>> separate
>>>> organisations involved in this information chain:
>>>> 
>>>> 1. The Ensembl database team (in Cambridge UK)
>>>> 
>>>> 2. The BioMart software developers (in Toronto CA) and Rhoda
>>>> Kinsella
>>>> (in Cambridge) who imports the Ensembl data into the BioMart system
>>>> 
>>>> 3. Bioconductor and specifically the biomaRt R package, which is
>>>> simply
>>>> a thin interface from R to a webservice, with no own content or
>>>> logic
>>>> (maintained Steffen Durinck in sunny Berkeley.)
>>>> 
>>>> Questions at levels 2 and 3 are good to ask on this list and are
>>>> usually
>>>> efficiently answered e.g. by Steffen or Rhoda.
>>>> 
>>>> What you report is, afaIct, an Ensembl data content problem, i.e.
>>>> level
>>>> 1. Here the advise is to email the Ensembl help desk:
>>>> helpdesk at ensembl.org
>>>> 
>>>> I hope this helps, please let us know if you have any more questions
>>>> or
>>>> observations.
>>>> 
>>>> Best wishes
>>>>     Wolfgang
>>>> 
>>>> ----------------------------------------------------
>>>> Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
>>>> 
>>>> 
>>>> -------- Original Message --------
>>>> Subject: [BioC] Reporting problem with annotation in biomaRt,
>>>> Illumina
>>>> arrays
>>>> Date: Tue, 17 Feb 2009 12:02:37 +0000
>>>> From: Nenad Bartonicek <nenad at ebi.ac.uk>
>>>> To: bioconductor at stat.math.ethz.ch
>>>> 
>>>> Dear all,
>>>> 
>>>> There seems to be a problem with probe annotation of certain
>>>> Illumina
>>>> arrays in biomaRt.
>>>> 
>>>> The following arrays: HumanWG6_V1, HumanRef8_V1, MouseWG6_V1,
>>>> MouseWG6_V1_B do not have valid Illumina probe names under the
>>>> "ProbeId" column.
>>>> They seem to contain values which are in the column
>>>> "Array_Address_Id", which is the one next to the Probe_id column in
>>>> the official Illumina flat files.
>>>> 
>>>> For example. the array "MouseWG6_V1"
>>>> 
>>>> library(lumiMouseIDMapping.db)
>>>> dbconn=lumiMouseIDMapping_dbconn()
>>>> tableNames=dbListTables(lumiMouseIDMapping_dbconn())
>>>> tableNames = tableNames[grep("Mouse",tableNames)]
>>>> tableNames
>>>> data = dbReadTable(dbconn,"MouseWG6_V1")
>>>> head(data)
>>>> 
>>>> The column ProbeId contains identifier "105290026" which is in the
>>>> flat file on
>>>> http://www.switchtoi.com/pdf/Annotation%20Files/Mouse/MouseWG-
>>>> 6_V1_1_R4_11234304_A.zip
>>>> under the column Array_Address_Id and has a proper identifier of
>>>> "ILMN_1229450".
>>>> 
>>>> Hope this helps and that it might be corrected sometime in the
>>>> future,
>>>> 
>>>> Nenad
>>>> 
>>>> Nenad Bartonicek
>>>> EMBL- European Bioinfromatics Institute
>>>> Wellcome Trust Genome Campus
>>>> Hinxton, Cambridge
>>>> 
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> 
>>>> 
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 


------------------------------------------------------
Pan Du, PhD
Research Assistant Professor
Northwestern University Biomedical Informatics Center
750 N. Lake Shore Drive, 11-176
Chicago, IL  60611
Office (312) 503-2360; Fax: (312) 503-5388
dupan (at) northwestern.edu



More information about the Bioconductor mailing list