[BioC] Analysis and annotation (full) of Affymetrix Mouse Exon 1.0 ST arrays

James Perkins jperkins at biochem.ucl.ac.uk
Wed Jun 27 16:00:30 CEST 2012


Hi,

I wasn't sure if this was worth starting a new thread for this, since
my question is very much related to this thread...

Is there any plan to include the "comprehensive" exon array mappings?

E.g. for rat:

If one goes here

http://www.affymetrix.com/estore/browse/products.jsp?productId=131489&categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST-Array#1_1

Then to Technical Documentation tab

And downloads the

"Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core, full,
extended and comprehensive rn4" data

http://www.affymetrix.com/Auth/support/downloads/library_files/RaEx-1_0-st-v1.r2.dt1.rn4.ps.zip

There are the core/extended/full ps and mps files here.

However there is also a comprehensive mps file.

Full, core and extended are from 2006.

The comprehensive is from 2010 (and gets updated more regularly), so
perhaps would be a better file to use for getNetAffx ?

Apologies if this has been covered before. I am never sure of what is
the best way to analyse exon array data at the gene level.

Thanks,

Jim




On 13 June 2012 21:37, Benilton Carvalho <beniltoncarvalho at gmail.com> wrote:
>
> please correct the code below to:
>
> eset = rma(raw, target='full') ## or 'core', 'extended' (whatever is available)
>
> and if you want results at the exon level
>
> eset = rma(raw, target='probeset')
> featureData(eset) = getNetAffx(raw, 'probeset')
>
> apologies for the mistake below.
>
> b
>
> On 13 June 2012 20:11, Benilton Carvalho <beniltoncarvalho at gmail.com> wrote:
> > FWIW, remember that you can obtain the contents of the annotation
> > files (the NA32 Affymetrix files) with:
> >
> > library(Biobase)
> > library(oligo)
> > raw = read.celfiles(list.celfiles())
> > eset = rma(raw, target='transcript')
> > featureData(eset) = getNetAffx(eset, 'transcript')
> > head(fData(eset))
> >
> > b
> >
> > On 13 June 2012 15:47, James W. MacDonald <jmacdon at uw.edu> wrote:
> >> Hi Andreas,
> >>
> >>
> >> On 6/13/2012 3:14 AM, Andreas Heider wrote:
> >>>
> >>> Dear mailing list,
> >>> I know this was on the list couple of times, and I think I read it all,
> >>> but
> >>> actually I still don't get it right. So here is my problem:
> >>>
> >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse Gene
> >>> 1.0
> >>> ST) in a similar fashion to eg. HG-U133 arrays.
> >>> That means, I want to finally have it accessible as an ExpressionSet
> >>> object
> >>> with a right Bioconductor annotation assigned. This should include GENE
> >>> SYMBOLS, RefSeq IDs and ENTREZ IDs.
> >>
> >>
> >> The problem here is that you want to do something that AFAIK isn't easy to
> >> do. The Gene ST arrays allow you to summarize all the probes that
> >> interrogate a particular transcript (e.g., all the exon-level probesets are
> >> collapsed to transcript level, and then you summarize). However, for the
> >> Exon ST arrays that isn't the case, unless there is something in xps to
> >> allow for that - I know next to nothing about that package, so Cristian
> >> Stratowa will have to chime in if I am missing something.
> >>
> >> For the Exon chips, you are always summarizing at the same probeset level,
> >> where there are <= 4 probes per probeset, and there can be any number of
> >> probesets that interrogate a given exon. Lots of these probesets interrogate
> >> regions that aren't even transcribed, according to current knowledge of the
> >> genome. When you choose core, extended or full probesets, you are just
> >> changing the number of probesets being used, not summarizing at a different
> >> level as with the Gene ST chip.
> >>
> >> So when you say you want gene symbols, refseq ids and gene ids, what exactly
> >> are you after? If a given probeset is in the intron of a gene do you want to
> >> annotate it as being part of that gene? How about if it is in the UTR (or
> >> really close to the UTR)? What do you want to do with the probesets where
> >> one or more of the probes binds in multiple positions in the genome? These
> >> are all questions that the exonmap package tries to consider, and it gets
> >> really complicated. That's why Affy went with the Gene ST chips - they
> >> unleashed the Exon chips on us and couldn't sell them because people were
> >> saying WTF do I do with this thing?
> >>
> >> I don't think there is an easy or obvious answer to your question. If you
> >> were to come up with what you think are reasonable answers to my questions,
> >> then it wouldn't be much work to extract the chr, start, end from the
> >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g.,
> >>  findOverlaps()) to decide what regions are being interrogated, and annotate
> >> from there.
> >>
> >> Best,
> >>
> >> Jim
> >>
> >>
> >>
> >>>
> >>> I can import it as a AffyBatch and generate an ExpressionSet with the help
> >>> of the Xmap/exonmap supplied CDF, but there is no annotation attached to
> >>> it.
> >>>
> >>> OR
> >>>
> >>> I can import the CEL files with the "oligo" package as a Exon Array object
> >>> and generate an ExpressionSet from it.
> >>> However in that case it still have no annotation.
> >>>
> >>> Surprisingly on the Bioconductor website there are all packages needed to
> >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work with Mouse
> >>> Exon 1.0 ST arrays seems missing!
> >>>
> >>> What am I doing wrong here? Has someone else had such problems?
> >>>
> >>> Thanks in advance for your effort,
> >>> Andreas
> >>>
> >>>        [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> Bioconductor mailing list
> >>> Bioconductor at r-project.org
> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>> Search the archives:
> >>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >>
> >> --
> >> James W. MacDonald, M.S.
> >> Biostatistician
> >> University of Washington
> >> Environmental and Occupational Health Sciences
> >> 4225 Roosevelt Way NE, # 100
> >> Seattle WA 98105-6099
> >>
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list