[BioC] Biocore Data Team : package mouse4302.db bug

Marc Carlson mcarlson at fhcrc.org
Wed Mar 9 19:34:00 CET 2011


Hi Vojtech,

You didn't share with us the output to your sessionInfo() so I have to
assume that you are using the latest R release and packages.  You might
find it helpful to have a look at our submission guidelines:

http://www.bioconductor.org/help/mailing-list/posting-guide/

Anyhow, if I assume that you are using R 2.12, and the appropriate
mouse4302.db package for that, then you would have a situation that
often crops up in microarrays where the manufacturer is uncertain about
which gene their probes actually measure.  In the annotation packages,
when this happens we hide the results from these mappings by default. 
But they can still be unmasked using the toggleProbes function like so:

AllEntrezMap = toggleProbes(mouse4302ENTREZID, "all")
mget(c("1423603_at","1451046_at"), AllEntrezMap)

And that will display both of the entrez gene IDs that this probe was
believed (by Affymetrix) to map to.  In contrast, if you had just called
mget on the original mapping you would have gotten NAs (because they
would be masked). 

Now I just happen to know (because I just rebuilt the annotation
packages with the latest information locally) that this particular
mapping has been updated in the most recent files from Affymetrix.  So
Affymetrix now thinks that these are no longer ambiguous probes and
declares that they map to entrez gene ID 22761.  So in the next set of
annotation packages (which will appear in a few weeks when we do the
next bioconductor release) there will no longer be a need to mask these
probesets by default.  The ensembl annotations source that you compared
to has the luxury that they can always update their annotations on a
whim, so their annotations are slightly more current than we are today. 
There is a disadvantage created by being very current like this
however.  So if you use their annotations you may find that they can
change pretty often, and this can make it harder to track down bugs or
to publish your work (since it won't be reproducible).  So here we
compromise and release a new (and versioned) "set" of annotations twice
a year (to go with each release of Bioconductor).

I hope this helps,


  Marc



On 03/08/2011 04:06 AM, Vojtech Kulvait wrote:
> Hello,
> I am getting wrong result from code: 
>
> library("mouse4302.db")
> s2p <- revmap(mouse4302SYMBOL);
> contents(s2p["Zfpm1"])["Zfpm1"]
>
> gives me empty list. But I know there should be 
>
> 1423603_at and 1451046_at (based on Ensembl, Netaffx).
>
> I allready looked into databases mouse4302.sqlite and org.Mm.eg.sqlite and this gene is present in both of them so there is nothing to add. I cant figure out how exactly database connection work, so please correct this bug.
>
> Thank you.
> Vojtech Kulvait.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list