[BioC] PLANdbAffy + Alternative Exon Annotation +XPS, aroma, oligo, RMAExpress

cstrato cstrato at aon.at
Tue Dec 7 23:04:49 CET 2010

Dear Branko,

Regarding your question why oligo & XPS require extra annotation I can 
only speak for xps:
In principle, you are right that for probe_set level analysis the 
PGF/CLF files are enough. However, I am extracting first some 
information from the probeset annotation file, e.g. the probeset_id, the 
position of the probeset_id, the probeset level (core, extended, full), 
which I use to create e.g. the probeset tree. If you are interested in 
details please have a look at the methods 
XExonChip::ImportProbesetAnnotation() and XExonChip::ReadData().

Best regards

On 12/7/10 3:09 PM, B.Misovic at lumc.nl wrote:
> Dear Roman, all
> Recently we tried your version of Annotation files for Gene 1.0 ST array
> that your team built from PLANdbAffy DB . I encountered some problems so
> I hope you can help.
> You provide nice CDF and Affy PGF/CLF files , but, the PGF/CLF were not
> useful in bioConductor packages for affy Exon/Gene type arrays ,namely:
> oligo & XPS as they require annotation file in csv format. I tried the
> annotation csv file from Affymetrix and after that from PLANdbAffy DB.
> The PLANdbAffy csv file is very different from Affymetrix one so import
> is not possible (actually csv file on the website is TAB delimited
> instead of comma so problem already starts there , and it requires
> reformatting).
> Christian from XPS was kind to inform me that :
>>... PLANdbAffy annotation columns have nothing to do with the Affymetrix
>>annotation columns. Thus xps will not read these annotation files.
>>Alternative annotation files must contain exactly the same columns as
>>the Affymetrix annotation files.
>>For whole genome and exon arrays it is not possible to use only the
> PGF->files w/o the annotation files, since I extract most of the
> important >information from the probeset-annotation file first, so this
> file is >absolutely essential. For example, column "level" contains the
> information >Core/Extended/Full, see the corresponding annotation README
> files for an >explanation of all columns.
>>xps error you get simply says that their PGF-file does not contain the
>  >AFFX controls, so maybe adding the AFFX controls to their PGF-file
> might >help. However, as you mention, they use their own Probesetids,
> which will >not match the Probesetids of the Affymetrix annotation
> files, thus it may >not work anyhow.
>>It is not quite clear to me why they created their own PGF-file. The
>  >Affymetrix PGF-file contains only 1-4 probes for each probeset, where
> each >exon consists of one or more probesets, thus the probability that
> a probe >within a probeset is not correct should be pretty small.
> However, a >probeset could be mapped to a wrong exon/gene or no gene at
> all, so it >should be sufficient to correct the Affymetrix annotation files.
> The tools like RMAExpress, EC., and Aroma.affymetrix, can work with CDF
> only. So after using RMAExpress (in command line mode) I did get
> Expression matrix out but I could not link 19532 Probeset ids to
> PLANdbAffy annotation csv file to collect gene basic information. What i
> did was , 1st load the full annotation file (not filtered) from PLANdbAffy:
> http://affymetrix2.bioinf.fbb.msu.ru/files.html
> and search the 2nd colum (Probe_Sets) with ids after RMA and I find 0...
> then i tried the 1st column (the Probes ) and found 8664... but I would
> expect vice versa situation ?
> So Roman can you please:
> 1) advise how to get real ids after RMAExpress run?
> 2) do you plan to build Annotation csv file as Affymetrix dose so that
> other software from Bioconductor oligo & XPS can use it?
> 3) comment on Christian feedback.
> Btw. Christian, how come RMAExpress, EC., and Aroma.affymetrix can work
> with CDFs only and oligo & XPS require extra annotation? From what I
> gather (after peaking into CDF and PGF files ) they show what probes are
> belonging to probe_set. So for probe_set level analysis (or more
> exon_like analysis) the PGF/CLF files alone seem to be enough?
> For bioc list, just to bring attention to this article & DB :
> PLANdbAffy: probe-level annotation database for Affymetrix expression
> microarrays , Ramil N. Nurtdinov1 et al.
> http://nar.oxfordjournals.org/content/38/suppl_1/D726.full
> http://affymetrix2.bioinf.fbb.msu.ru/
> Maybe some of bioC experts have comments about it?
> Best,
> Branko
