[BioC] Analysis and annotation (full) of Affymetrix Mouse Exon 1.0 ST arrays

Wed Jun 13 20:33:34 CEST 2012

Dear Andreas,

Please note that I talk only about package xps, which does contain it's 
own annotation, based on the Affymetrix annotation files, in this case 
on files "MoEx-1_0-st-v1.na32.mm9.probeset.csv" and 
"MoEx-1_0-st-v1.na32.mm9.transcript.csv", respectively. Thus with xps 
you can do rma() on the trancript level and get the transcript annotation.

Package xps creates first a "scheme" file (see e.g. script 
"script4schemes.R") which contains the Affymetrix annotation files for 
probesets and transcripts, including the MoEx 1.0 ST identifiers.

Best regards
Christian

On 6/13/12 7:47 PM, Andreas Heider wrote:
> Yes, you are right!
> rma(target=()) can be used to collapse to transcript or probeset level.
> However, the problem is still there, as I a left with a nice
> ExpressionSet obejct that has values mapped to transcripts (if I decide
> so) but they are only annotated by something like 4701234. That is a
> probeset/transcript name for example. Now that wouldn'T be a problem
> given that normally such an identifier could be easily translated via
> Bioconductors annotation packages.
>
> But here comes the most significant part: There is no annotation package
> available that includes MoEx 1.0 ST identifiers!
>
> I am trying to get my package to work on these Exon arrays. And the
> package expects a proper annotation package such as eg. "mouse4302" to
> be attached to the annotation slot of the ExpressionSet.
>
> I'm still puzzled.
>
> 2012/6/13 cstrato <cstrato at aon.at <mailto:cstrato at aon.at>>
>
>     Dear Andreas,
>
>     As Jim already mentioned, package xps is able to preprocess MoExon
>     1.0 ST arrays at the probeset and the gene level, see also my
>     earlier reply to a similar question:
>     https://www.stat.math.ethz.ch/__pipermail/bioconductor/2012-__June/045958.html
>     <https://www.stat.math.ethz.ch/pipermail/bioconductor/2012-June/045958.html>
>
>     Best regards
>     Christian
>     _._._._._._._._._._._._._._._.___._._
>     C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
>     V.i.e.n.n.a           A.u.s.t.r.i.a
>     e.m.a.i.l:        cstrato at aon.at <http://aon.at>
>     _._._._._._._._._._._._._._._.___._._
>
>
>
>
>     On 6/13/12 4:47 PM, James W. MacDonald wrote:
>
>         Hi Andreas,
>
>         On 6/13/2012 3:14 AM, Andreas Heider wrote:
>
>             Dear mailing list,
>             I know this was on the list couple of times, and I think I
>             read it
>             all, but
>             actually I still don't get it right. So here is my problem:
>
>             I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT
>             Mouse
>             Gene 1.0
>             ST) in a similar fashion to eg. HG-U133 arrays.
>             That means, I want to finally have it accessible as an
>             ExpressionSet
>             object
>             with a right Bioconductor annotation assigned. This should
>             include GENE
>             SYMBOLS, RefSeq IDs and ENTREZ IDs.
>
>
>         The problem here is that you want to do something that AFAIK
>         isn't easy
>         to do. The Gene ST arrays allow you to summarize all the probes that
>         interrogate a particular transcript (e.g., all the exon-level
>         probesets
>         are collapsed to transcript level, and then you summarize).
>         However, for
>         the Exon ST arrays that isn't the case, unless there is
>         something in xps
>         to allow for that - I know next to nothing about that package, so
>         Cristian Stratowa will have to chime in if I am missing something.
>
>         For the Exon chips, you are always summarizing at the same probeset
>         level, where there are <= 4 probes per probeset, and there can
>         be any
>         number of probesets that interrogate a given exon. Lots of these
>         probesets interrogate regions that aren't even transcribed,
>         according to
>         current knowledge of the genome. When you choose core, extended
>         or full
>         probesets, you are just changing the number of probesets being
>         used, not
>         summarizing at a different level as with the Gene ST chip.
>
>         So when you say you want gene symbols, refseq ids and gene ids, what
>         exactly are you after? If a given probeset is in the intron of a
>         gene do
>         you want to annotate it as being part of that gene? How about if
>         it is
>         in the UTR (or really close to the UTR)? What do you want to do
>         with the
>         probesets where one or more of the probes binds in multiple
>         positions in
>         the genome? These are all questions that the exonmap package
>         tries to
>         consider, and it gets really complicated. That's why Affy went
>         with the
>         Gene ST chips - they unleashed the Exon chips on us and couldn't
>         sell
>         them because people were saying WTF do I do with this thing?
>
>         I don't think there is an easy or obvious answer to your
>         question. If
>         you were to come up with what you think are reasonable answers to my
>         questions, then it wouldn't be much work to extract the chr,
>         start, end
>         from the pd.moex.1.0.st.v1 package, and then use GenomicFeatures
>         (e.g.,
>         findOverlaps()) to decide what regions are being interrogated, and
>         annotate from there.
>
>         Best,
>
>         Jim
>
>
>
>             I can import it as a AffyBatch and generate an ExpressionSet
>             with the
>             help
>             of the Xmap/exonmap supplied CDF, but there is no annotation
>             attached
>             to it.
>
>             OR
>
>             I can import the CEL files with the "oligo" package as a
>             Exon Array
>             object
>             and generate an ExpressionSet from it.
>             However in that case it still have no annotation.
>
>             Surprisingly on the Bioconductor website there are all
>             packages needed to
>             deal with Mouse Gene 1.0 ST arrays but the informtion to
>             work with Mouse
>             Exon 1.0 ST arrays seems missing!
>
>             What am I doing wrong here? Has someone else had such problems?
>
>             Thanks in advance for your effort,
>             Andreas
>
>             [[alternative HTML version deleted]]
>
>             _________________________________________________
>             Bioconductor mailing list
>             Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>             https://stat.ethz.ch/mailman/__listinfo/bioconductor
>             <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>             Search the archives:
>             http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>             <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>