[BioC] incorrect gene symbols in annotate
Robert Gentleman
rgentlem at jimmy.harvard.edu
Fri Nov 21 17:55:33 MET 2003
To amplify a bit on Jianhua's explanation:
the mapping between any two sets of identifiers can be problematic
when both sides are subject to the constant evolution and improvment
that presently exists for genomic data.
There are many different strategies and folks need to pick one that
satisfies their particular needs. We have decided on a process that
we believe satisfies certain basic requirements (reproducibility
being of primary importance). All Bioconductor metadata packages are
produced in a well documented manner. The data sources and their
version numbers (or dates of acquisition if the data are not
versioned) are provided in the documentation for the package.
This allows users to verify our mappings (but that is with respect to
the data we have selected and the manner in which we have chosen to
resolve conflicts that arise). Differences between our mappings and
those available from other sources are not necessarily errors. They
may indicate changes in knowledge between when our mapping was done
and the current state. They may in fact represent errors and we take
all reports such as this one seriously (but it would be helpful if
some indication of why a person thinks there is an error, what their
data source is etc was provided). We would especially welcome
suggestions for reliable data sources and/or mappings that are needed
that we do not presently supply.
I doubt that it is possible to be concurrent with all data sources
(and even if so, we certainly do not have those resources). I
personally feel that not providing a well documented set of mappings
and leaving researchers to search through the every changing
labyrinth that is the reality of the web resources does them a great
disservice. They can spend days trying to decide why the "same
analysis" done at different times yielded different sets of genes
only to find out that the web resource had
changed between two successive queries. This lack of reproducibility
seems to be very undesireable to me. We strive for reproducibility of
the numerical results, we should do the same for the mappings.
We build reasonably often (and can do so on demand), and
provide documentation about how we built. We also archive all old
versions so that users can assess how changes have impacted their
previous mappings if desired.
Robert
On Fri, Nov 21, 2003 at 11:37:51AM -0500, John Zhang wrote:
> You may get somewhat different results depending on the source you are comparing
> the mappings to and even the time when the comparisons are made. We try to keep
> the mappings updtated as frequently as we can.
>
> The link "MetaData/Annotation Packages" on Bioconductor web site contains a
> brief description of the building process of the annotation data packages and
> the vignettes "How to use AnnBuilder" and "Basic Functions of AnnBuilder"
> contain instructions on how to build an annotation data package. You may try to
> build your own annotation data package to make sure your annoataions are
> current.
>
>
> >
> >I have come accross some errors when linking probe IDs with gene symbols.
> >
> >In most cases the probe ID retrieves the corredct gene symbol,
> >however the following probe IDs should correspond to CD4 antigen, CD4
> >anitigen, and FCGR3A respectively.
> >
> >genes<-c("203547_at","216424_at","204006_s_at")
> >symbol<-multiget(genes,env=hgu133aSYMBOL)
> >
> >symbol
> >$"203547_at"
> >[1] "C3F"
> >
> >$"216424_at"
> >[1] NA
> >
> >$"204006_s_at"
> >[1] "FCGR3B"
> >
> >
> >
> >Regards
> >
> >
> >Anthony
> >
> >
> >
> >R session codes
> >
> >
> >library(biobase)
> >library(annotate)
> >library(hgu133a)
> >
> >genes<-c("203547_at","216424_at","204006_s_at")
> >symbol<-multiget(genes,env=hgu133aSYMBOL)
> >--
> >______________________________________________
> >
> >Anthony Bosco - Cell Biology Research Assistant
> >
> >Institute for Child Health Research
> >(Company Limited by Guarantee ACN 009 278 755)
> >Subiaco, Western Australia, 6008
> >
> >Ph 61 8 9489 , Fax 61 8 9489 7700
> >email anthonyb at ichr.uwa.edu.au
> >______________________________________________
> > [[alternative HTML version deleted]]
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at stat.math.ethz.ch
> >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
> Jianhua Zhang
> Department of Biostatistics
> Dana-Farber Cancer Institute
> 44 Binney Street
> Boston, MA 02115-6084
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
--
+---------------------------------------------------------------------------+
| Robert Gentleman phone : (617) 632-5250 |
| Associate Professor fax: (617) 632-2444 |
| Department of Biostatistics office: M1B20 |
| Harvard School of Public Health email: rgentlem at jimmy.harvard.edu |
+---------------------------------------------------------------------------+
More information about the Bioconductor
mailing list