[BioC] duplicate genes in Affy arrays

Thu Aug 18 13:50:49 CEST 2005

Is there any general procedure for handling duplicate genes in Affy arrays?

For example, for the hu6800 array which has 7129 probe sets,
there are 869 genes that are represented by more than one probe set,
with one gene (ACTB) being represented by 9 probe sets.

g.symbols=aafSymbol(X.gnames,"hu6800")
ug.symbols <- unlist(g.symbols)
length(ug.symbols) #6980 (7129-6980 = 149 with no symbols)
symbol.usage <- table(ug.symbols)
sum(symbol.usage>1)  # 869
max(symbol.usage)  #9

Ignoring this would seem to invalidate a number of multiple comparison
procedures.  Is it reasonable to average probe set expression levels for
the same gene?  Are there any "pre-processing" routines that address this
issue?

The flip side of this question is "Do probe sets with the same gene symbol
really specify the same gene? Does it matter which annotational method is
used to name genes?"