[BioC] FW: PLANdbAffy + Alternative Exon Annotation +XPS, aroma, oligo, RMAExpress

Ramil Nurtdinov ramil at bioinf.fbb.msu.ru
Tue Dec 7 16:21:18 CET 2010


Dear colleagues

My experience with R BioConductor and Affymetrix Human Exon 1.0 ST
array started from oligo package. Unfortunately for my 19 HuExon1.0
arrays R asks for approx 6-7 Gigabytes of  memory. While RMA algorithm
in Affymetrix Expression Console takes 40 minutes of my Sony Vailo
notebook. Second there was no good annotation for this chip in R,
except X:Map, my competitor for the paper :))

So first problem I had solved by Expression Console and for second
problem we had developed PLANdbAffy
http://nar.oxfordjournals.org/content/38/suppl_1/D726.long

Now I am finishing EnsEmbl plus hg19 version of database. I understand
that BioConductor is
widely used in scientific word but my load is rather big because of
many new projects.

If somebody gives me the format for annotation I can make
corresponding database summary file.

Yours sincerely,
Ramil Nurtdinov, PhD

.On 12/7/10, B.Misovic at lumc.nl <B.Misovic at lumc.nl> wrote:
> Dear Ramil,
>
>
>
>   I see I forgot to add you in the email  bellow   which I've sent to
> bioConductor mailing list and our collaborators  in Poland... just in
> case you have some comments.
>
>
>
> Best,
>
> Branko
>
>
>
> ________________________________
>
> From: Misovic, B. (TOXGEN)
> Sent: 07 December 2010 15:09
> To: 'roman.jaksik at polsl.pl'; 'bioconductor at r-project.org'
> Cc: 'cstrato'
> Subject: PLANdbAffy + Alternative Exon Annotation
> +XPS,aroma,oligo,RMAExpress
>
>
>
> Dear Roman, all
>
>
>
>   Recently we tried your version of Annotation files for Gene 1.0 ST
> array that your team built from PLANdbAffy DB . I encountered some
> problems so I hope you can help.
>
>
>
> You provide nice CDF and Affy PGF/CLF files , but, the PGF/CLF were not
> useful in  bioConductor packages for affy Exon/Gene type arrays ,namely:
> oligo  & XPS as they require annotation file in csv format. I tried the
> annotation csv file from Affymetrix and after that from PLANdbAffy DB.
> The PLANdbAffy  csv file is very different from Affymetrix one so import
> is not possible (actually csv file on the website is TAB delimited
> instead of comma so problem already starts there , and  it requires
> reformatting).
>
> Christian from XPS was kind to inform me that :
>
>
>>... PLANdbAffy annotation  columns have nothing to do with the
> Affymetrix
>>annotation columns. Thus xps will not read these annotation files.
>
>>Alternative annotation files must contain exactly the same columns as
>
>>the Affymetrix annotation files.
>
>
>
>>For whole genome and exon arrays it is not possible to use only the
> PGF->files w/o the annotation files, since I extract most of the
> important >information from the probeset-annotation file first, so this
> file is >absolutely essential. For example, column "level" contains the
> information >Core/Extended/Full, see the corresponding annotation README
> files for an >explanation of all columns.
>
>
>
>>xps  error you get simply says that their PGF-file does not contain the
>>AFFX controls, so maybe adding the AFFX controls to their PGF-file
> might >help. However, as you mention, they use their own Probesetids,
> which will >not match the Probesetids of the Affymetrix annotation
> files, thus it may >not work anyhow.
>
>
>
>>It is not quite clear to me why they created their own PGF-file. The
>>Affymetrix PGF-file contains only 1-4 probes for each probeset, where
> each >exon consists of one or more probesets, thus the probability that
> a probe >within a probeset is not correct should be pretty small.
> However, a >probeset could be mapped to a wrong exon/gene or no gene at
> all, so it >should be sufficient to correct the Affymetrix annotation
> files.
>
>
>
>    The tools like RMAExpress, EC., and Aroma.affymetrix, can work with
> CDF only. So after using RMAExpress (in command line mode)  I did get
> Expression matrix out but I could not link 19532 Probeset ids to
> PLANdbAffy annotation csv file to collect gene basic information. What i
> did was , 1st load the full annotation file (not filtered) from
> PLANdbAffy:
> http://affymetrix2.bioinf.fbb.msu.ru/files.html
>
> and search the 2nd colum (Probe_Sets) with ids after RMA and I find 0...
> then i tried the 1st column (the Probes ) and found  8664... but I would
> expect vice versa situation ?
>
>
>
> So Roman can you please:
> 1) advise how to get real ids after RMAExpress run?
> 2) do you plan to build Annotation csv file as Affymetrix dose so that
> other software from Bioconductor oligo  & XPS can use it?
> 3) comment on Christian feedback.
>
>
>
> Btw. Christian, how come RMAExpress, EC., and Aroma.affymetrix can work
> with CDFs only  and oligo & XPS require extra annotation? From what  I
> gather (after peaking into CDF and PGF files ) they show what probes are
> belonging to probe_set. So for probe_set level analysis (or more
> exon_like analysis) the PGF/CLF files alone seem to be enough?
>
>
>
> For bioc list, just to bring attention to this article & DB :
>
>
>
> PLANdbAffy: probe-level annotation database for Affymetrix expression
> microarrays , Ramil N. Nurtdinov1 et al.
>
> http://nar.oxfordjournals.org/content/38/suppl_1/D726.full
>
>
>
> http://affymetrix2.bioinf.fbb.msu.ru/
>
>
>
> Maybe some of bioC experts have comments about it?
>
>
>
> Best,
>
> Branko
>
>
>
> --------------------------
>
> Branislav Misovic,
>
> Department of Toxicogenetics
>
> Leiden University Medical Center
>
> Einthovenweg 20, 2333 ZC Leiden
>
> PO.box 9600, Building2,Room:T3-11
>
> 2300 RC Leiden
>
> The Netherlands
>
> Phone: +31 71 526 9636
>
> Mob: 0653135855
>
> E-mail:
>
> b.misovic at lumc.nl
>
> braniti at gmail.com
>
>
>
>



More information about the Bioconductor mailing list