[BioC] what's really in hgu133plus2.db?

Fri Feb 18 20:58:26 CET 2011

Jim,

Thanks for your response. The point of understanding exactly where a probeset is located is of fundamental importance because it is now clear from the ENCODE project that around 90% of genome sequence is actively transcribed in a regulated way - John Mattick presented an excellent talk introducing this topic at the HGM2007 meeting in Montreal. The question then is; 'is it mRNA or another (regulatory?) RNA species that we are measuring?'. The fact that 'orphaned' probesets detect significantly up- or down-regulated transcription is extremely interesting and should not be ignored just because they now map outside 'genes' (whatever they may be - the human GNAS locus generates 59 different transcripts, some of which do not overlap). 

Dave
Dr David Iles
Institute for Integrative and Comparative Biology
University of Leeds
Leeds LS2 9JT

d.e.iles at leeds.ac.uk

On 18 Feb 2011, at 19:24, James W. MacDonald wrote:

> Hi David,
> 
> On 2/18/2011 11:41 AM, David Iles wrote:
>> Dear All,
>> 
>> Can anyone point me to a URL where I can obtain an overview of the
>> sources of the data incorporated in the current version of
>> hgu133plus2.db? I saw to my horror that the actual probesets are
>> based on a really obsolete human genome assembly (2003), which has
>> changed significantly over the years. As have also genes, gene
>> locations, genomic intervals, RefSeq/UniGene entries etcetcetc......
> 
> So what exactly is the question? As you note, the chip was designed in 
> the early 2000's, so was necessarily based on a (now) old version of the 
> UniGene database. That is the downfall of the expression arrays; they 
> are stale almost from the instant they hit the market.
> 
> Since the probesets are based on things that may now be different, it is 
> to a certain extent irrelevant how current the hgu133plus2.db data are, 
> because the probeset --> gene mappings may be suspect. You can update 
> the gene info all you want, but if the probeset doesn't actually measure 
> a given transcript, then what is the point?
> 
> We base the annotation on the probeset --> entrez gene mappings supplied 
> by Affymetrix, which are supposed to be updated regularly. Not having 
> checked that (and given the fact that we take no stance on the veracity 
> of these mappings), they are what they are. Any significant results will 
> require close inspection of the probesets to determine if you believe 
> that they measure what they purport to measure.
> 
> As an alternative, you can try the MBNI re-mapped probesets, which both 
> update the mappings and remove replicate probesets (by creating single 
> probesets per gene/transcript/etc). They can be obtained via biocLite, 
> or individually here:
> 
> http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp
> 
> Best,
> 
> Jim
> 
> 
>> 
>> Thanks
>> 
>> Dave Dr David Iles Institute for Integrative and Comparative Biology
>> University of Leeds Leeds LS2 9JT
>> 
>> d.e.iles at leeds.ac.uk
>> 
>> _______________________________________________ Bioconductor mailing
>> list Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>> archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 
>