[BioC] Re-mapped Affy CDF files

Jenny Drnevich drnevich at uiuc.edu
Wed Jan 11 18:30:56 CET 2006


Hi all,

I looked at the alternative mappings a few months ago after attending a 
seminar given by Stanley Watson, Director of Mental Health Research 
Institute at University of Michigan. He recommended that the alternative 
mappings always be used because of the large discrepancies they found 
between Affymetrix's mapping and their mappings of the probes. I don't know 
whether they have any documentation on whether their mappings yield results 
that are more often validated through alternative methodologies or not, but 
they do have quite a lot of documentation on what they did and why they did 
it - see the description of custom CDF files and their new paper from links 
on the page Jim put in his first post. Even if Ensembl or Affymetrix 
updates their annotation based on remapping, the CDFs aren't changed, so 
the summarization and statistical analysis are done using probes that may 
not all map to the same "gene" uniquely. What these alternative mapping do 
is to remap each probe, then redefine probe sets based on all the probes 
that map to a "gene", and that it's these re-groupings that are most 
important.  Many of the alternative mappings are subsets of other ones, 
like taking only the first 11 probes from the 3' end in cases where there 
are more than 11 probes, so there are not quite as many alternative 
mappings as it first appears.

I do agree with Jim that coming up with a defensible rationale is 
important, as I was having trouble deciding which mapping might be the best 
to use. Stan Watson would argue that any of them are better than the 
outdated Affymetrix groupings. If Affy did theirs based on Unigene 
clustering, then the new mapping & grouping based on Unigene might be a 
defensible choice. In the end, I succumbed to historical inertia and went 
with Affymetrix's CDF, in part because I do analyses for many organisms, 
and MBNI only has alternative CDFs for human, mouse, and rat. However, I 
was able to get the alternative CDFs to work in Bioconductor with little 
trouble.

As far as validating the genes on the magical "significant list", I did get 
some advice at a recent conference to ALWAYS first check the current probe 
mappings for those significant genes, then only concentrate on those that 
have most or all of their probes where they should be. Does anyone do this 
routinely? Should we, but we don't because it is too time consuming?

Cheers,
Jenny


At 08:51 AM 1/11/2006, James W. MacDonald wrote:
>Sean Davis wrote:
> > I'm not sure what their build process is, but doesn't Ensembl do some
> > probe-based mappings?
>
>Maybe. I couldn't find anything obvious in a cursory glance at their
>website.
>
>Anyway, the main question for me is not the number or type of
>alternative mappings that exist for Affy arrays (there are 19 different
>CDFs that the MBNI folks produce, including several based on Ensembl
>mappings). I am more concerned with being able to establish a defensible
>rationale for using a particular mapping.
>
>I guess what we do right now with the Affy CDFs isn't defensible except
>on a historical basis, but the weight of history is pretty strong. For
>instance, attributing significance at an alpha of < 0.05 has no
>rationale AFAIK, but is pretty much written in stone due to precedent.
>
>OTOH, most if not all microarray data are caveat emptor - it is
>incumbent on the end user to take the magical list of differentially
>expressed genes and validate them with an alternative methodology.
>
>Given that state of affairs, is it not reasonable to choose the probe
>mappings that one uses with the same logic that one uses for choosing
>the preferred way of computing expression values?
>
>Jim
>
>
>
>
> >
> > Sean
> >
> >
>
>
>--
>James W. MacDonald
>Affymetrix and cDNA Microarray Core
>University of Michigan Cancer Center
>1500 E. Medical Center Drive
>7410 CCGC
>Ann Arbor MI 48109
>734-647-5623
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu



More information about the Bioconductor mailing list