[BioC] Is only 2/3 of the Human Gene ST 1.0 probesets associated to a useful annotation ?

Nizar Touleimat mohamed.touleimat at uclouvain.be
Wed Jan 6 12:11:28 CET 2010


I am currently working with gene expression data produced with Human
Gene ST 1.0 array (Affymetrix).
I am trying to identify gene signatures that can be used in predictive
models to discriminate between different kind of pathological states.
I already identified a set of signatures (set of transcript cluster id
lists) but, for some of these signatures the number of features that are
not associated to a gene ID exceds 50% of the signature.
According to me that poses the problem of the interpretation of the
signatures and their comparison with signatures identified from
expression data produced with other kind of affymetrix arrays
(HG-U133PLUS2 for example).
I downloaded the Human Gene ST 1.0 annotation file
(HuGene-1_0-st-v1.na30.hg19.transcript.csv) from Affymetrix and
extracted some informations:
Total number of transcript cluster id: 33257 (29096 when controls are
Number of transcript cluster ids with an assigned gene: 22118
Number of different unique 'genes' assigned to transcript cluster ids: 21460

If I look to the 'mrna_assignment' annotation:
Number of transcript cluster ids with an assigned mRNA: 31269
Number of different unique 'mRNA' assigned to transcript cluster ids: 26715

My questions:
1. If about 'only' 2/3 of the transcript cluster ids are annotated with
a gene name what to do with the non annotated ones ? In particular, how
to interpret them when they are selected as biomarkers and how to
compare them with biomarkers identified from other kinds of affymetrix
arrays (HG-U133PLUS2 for example) ?
2. Are the 'mrna_assignment' interpretable and their interpretation
reliable ? And, as for question 1. how to compare the biomarkers with an
mRNA assignement to biormarkers corresponding to a different array
technology  ?



M. Nizar Touleimat, PhD
Machine Learning Group
Ecole Polytechnique de Louvain
Université catholique de Louvain
B-1348 Louvain-la-Neuve (Belgium)
Phone (direct): +32-10-479106
Phone (secr.): +32-10-472425
Fax: +32-10-450345
Email: mohamed.touleimat at uclouvain.be

More information about the Bioconductor mailing list