[BioC] Analysis and annotation (full) of Affymetrix Mouse Exon 1.0 ST arrays

Wed Jun 27 17:33:27 CEST 2012

Could you expand on that a little? Do you mean you can change the
level of confidence of the ps ids mapping to the ENSEMBL gene using
biomart?

On 27 June 2012 17:30, Andreas Heider <aheider at trm.uni-leipzig.de> wrote:
> Also remember, that this will be influenced by your selection of identifiers
> in biomart!
>
>
> 2012/6/27 Andreas Heider <aheider at trm.uni-leipzig.de>
>>
>> The AnnMap CDF should take care of that.
>>
>>
>> 2012/6/27 James Perkins <jperkins at biochem.ucl.ac.uk>
>>>
>>> Thanks Andreas! That's really useful information, I will have a look.
>>>
>>> Out of interest, did you look at the distribution of expression levels
>>> for the different prob-sets? If you are including all probe-sets, I
>>> would guess that if there were a lot of predicted/intronic probe sets
>>> that aren't expressed that could bias your gene-level estimation, i.e.
>>> if it the proportion is above the break-down point of the
>>> summarisation/aggregation method.
>>>
>>> Although perhaps the CDF from annmap takes care of that?
>>>
>>> Cheers!
>>>
>>> Jim
>>>
>>> On 27 June 2012 16:45, Andreas Heider <aheider at trm.uni-leipzig.de> wrote:
>>> > Ok, sorry, that was the "short answer". Here comes the longer one:
>>> > 1. get a CDF for the chip, get it at
>>> > http://annmap.picr.man.ac.uk/download/
>>> > 2. load CEL files using standard affy package
>>> > 3. asign the downloaded CDF to your AffyBatch object
>>> > 4. calculate RMA or whatever you want (NOTE: this will get you all
>>> > probesets, no restrictions as in eg "core")
>>> > 5. pull the whole set of identifiers from biomaRt and annotate your
>>> > expression matrix with this information
>>> > 6. "collapse" probesets targetting the same identifier to its mean,
>>> > median
>>> > or medpolish, whatever suits your needs best via functions as "recast"
>>> > or
>>> > "aggregate"
>>> > 7. have fun with your new expression matrix!
>>> >
>>> > Hope that helps, I needed also some time to figure out the individual
>>> > steps.
>>> >
>>> >
>>> > 2012/6/27 James Perkins <jperkins at biochem.ucl.ac.uk>
>>> >>
>>> >> Thanks for the pointer Andreas,
>>> >>
>>> >> How did you go from probe sets for a given gene to the transcript
>>> >> level? And how did you know if it was "core", "extended", "full"
>>> >> confidence?
>>> >>
>>> >> Also, how did you summarise the probeset expression levels to make a
>>> >> transcript? Using biomart I get ~25k unique ensembl genes mapping to
>>> >> probe set ids, which is much higher than when I follow the oligo
>>> >> pipeline and perform RMA at core/extended/full level, and use getAffx
>>> >> for annotation.
>>> >>
>>> >> Thanks,
>>> >>
>>> >> Jim
>>> >>
>>> >> On 27 June 2012 16:03, Andreas Heider <aheider at trm.uni-leipzig.de>
>>> >> wrote:
>>> >> > Dear Jim,
>>> >> > I pulled all relevant annotation via biomaRt, as biomart was all
>>> >> > mappings of
>>> >> > exon array probeset IDs to eg ENTREZID or GENESYMBOL. Than you can
>>> >> > go on
>>> >> > from that.
>>> >> >
>>> >> > Cheers,
>>> >> > Andreas
>>> >> >
>>> >> >
>>> >> > 2012/6/27 James Perkins <jperkins at biochem.ucl.ac.uk>
>>> >> >>
>>> >> >> Hi,
>>> >> >>
>>> >> >> I wasn't sure if this was worth starting a new thread for this,
>>> >> >> since
>>> >> >> my question is very much related to this thread...
>>> >> >>
>>> >> >> Is there any plan to include the "comprehensive" exon array
>>> >> >> mappings?
>>> >> >>
>>> >> >> E.g. for rat:
>>> >> >>
>>> >> >> If one goes here
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> http://www.affymetrix.com/estore/browse/products.jsp?productId=131489&categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST-Array#1_1
>>> >> >>
>>> >> >> Then to Technical Documentation tab
>>> >> >>
>>> >> >> And downloads the
>>> >> >>
>>> >> >> "Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core,
>>> >> >> full,
>>> >> >> extended and comprehensive rn4" data
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> http://www.affymetrix.com/Auth/support/downloads/library_files/RaEx-1_0-st-v1.r2.dt1.rn4.ps.zip
>>> >> >>
>>> >> >> There are the core/extended/full ps and mps files here.
>>> >> >>
>>> >> >> However there is also a comprehensive mps file.
>>> >> >>
>>> >> >> Full, core and extended are from 2006.
>>> >> >>
>>> >> >> The comprehensive is from 2010 (and gets updated more regularly),
>>> >> >> so
>>> >> >> perhaps would be a better file to use for getNetAffx ?
>>> >> >>
>>> >> >> Apologies if this has been covered before. I am never sure of what
>>> >> >> is
>>> >> >> the best way to analyse exon array data at the gene level.
>>> >> >>
>>> >> >> Thanks,
>>> >> >>
>>> >> >> Jim
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On 13 June 2012 21:37, Benilton Carvalho
>>> >> >> <beniltoncarvalho at gmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> > please correct the code below to:
>>> >> >> >
>>> >> >> > eset = rma(raw, target='full') ## or 'core', 'extended' (whatever
>>> >> >> > is
>>> >> >> > available)
>>> >> >> >
>>> >> >> > and if you want results at the exon level
>>> >> >> >
>>> >> >> > eset = rma(raw, target='probeset')
>>> >> >> > featureData(eset) = getNetAffx(raw, 'probeset')
>>> >> >> >
>>> >> >> > apologies for the mistake below.
>>> >> >> >
>>> >> >> > b
>>> >> >> >
>>> >> >> > On 13 June 2012 20:11, Benilton Carvalho
>>> >> >> > <beniltoncarvalho at gmail.com>
>>> >> >> > wrote:
>>> >> >> > > FWIW, remember that you can obtain the contents of the
>>> >> >> > > annotation
>>> >> >> > > files (the NA32 Affymetrix files) with:
>>> >> >> > >
>>> >> >> > > library(Biobase)
>>> >> >> > > library(oligo)
>>> >> >> > > raw = read.celfiles(list.celfiles())
>>> >> >> > > eset = rma(raw, target='transcript')
>>> >> >> > > featureData(eset) = getNetAffx(eset, 'transcript')
>>> >> >> > > head(fData(eset))
>>> >> >> > >
>>> >> >> > > b
>>> >> >> > >
>>> >> >> > > On 13 June 2012 15:47, James W. MacDonald <jmacdon at uw.edu>
>>> >> >> > > wrote:
>>> >> >> > >> Hi Andreas,
>>> >> >> > >>
>>> >> >> > >>
>>> >> >> > >> On 6/13/2012 3:14 AM, Andreas Heider wrote:
>>> >> >> > >>>
>>> >> >> > >>> Dear mailing list,
>>> >> >> > >>> I know this was on the list couple of times, and I think I
>>> >> >> > >>> read
>>> >> >> > >>> it
>>> >> >> > >>> all,
>>> >> >> > >>> but
>>> >> >> > >>> actually I still don't get it right. So here is my problem:
>>> >> >> > >>>
>>> >> >> > >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT
>>> >> >> > >>> Mouse
>>> >> >> > >>> Gene
>>> >> >> > >>> 1.0
>>> >> >> > >>> ST) in a similar fashion to eg. HG-U133 arrays.
>>> >> >> > >>> That means, I want to finally have it accessible as an
>>> >> >> > >>> ExpressionSet
>>> >> >> > >>> object
>>> >> >> > >>> with a right Bioconductor annotation assigned. This should
>>> >> >> > >>> include
>>> >> >> > >>> GENE
>>> >> >> > >>> SYMBOLS, RefSeq IDs and ENTREZ IDs.
>>> >> >> > >>
>>> >> >> > >>
>>> >> >> > >> The problem here is that you want to do something that AFAIK
>>> >> >> > >> isn't
>>> >> >> > >> easy to
>>> >> >> > >> do. The Gene ST arrays allow you to summarize all the probes
>>> >> >> > >> that
>>> >> >> > >> interrogate a particular transcript (e.g., all the exon-level
>>> >> >> > >> probesets are
>>> >> >> > >> collapsed to transcript level, and then you summarize).
>>> >> >> > >> However,
>>> >> >> > >> for
>>> >> >> > >> the
>>> >> >> > >> Exon ST arrays that isn't the case, unless there is something
>>> >> >> > >> in
>>> >> >> > >> xps
>>> >> >> > >> to
>>> >> >> > >> allow for that - I know next to nothing about that package, so
>>> >> >> > >> Cristian
>>> >> >> > >> Stratowa will have to chime in if I am missing something.
>>> >> >> > >>
>>> >> >> > >> For the Exon chips, you are always summarizing at the same
>>> >> >> > >> probeset
>>> >> >> > >> level,
>>> >> >> > >> where there are <= 4 probes per probeset, and there can be any
>>> >> >> > >> number
>>> >> >> > >> of
>>> >> >> > >> probesets that interrogate a given exon. Lots of these
>>> >> >> > >> probesets
>>> >> >> > >> interrogate
>>> >> >> > >> regions that aren't even transcribed, according to current
>>> >> >> > >> knowledge
>>> >> >> > >> of the
>>> >> >> > >> genome. When you choose core, extended or full probesets, you
>>> >> >> > >> are
>>> >> >> > >> just
>>> >> >> > >> changing the number of probesets being used, not summarizing
>>> >> >> > >> at a
>>> >> >> > >> different
>>> >> >> > >> level as with the Gene ST chip.
>>> >> >> > >>
>>> >> >> > >> So when you say you want gene symbols, refseq ids and gene
>>> >> >> > >> ids,
>>> >> >> > >> what
>>> >> >> > >> exactly
>>> >> >> > >> are you after? If a given probeset is in the intron of a gene
>>> >> >> > >> do
>>> >> >> > >> you
>>> >> >> > >> want to
>>> >> >> > >> annotate it as being part of that gene? How about if it is in
>>> >> >> > >> the
>>> >> >> > >> UTR
>>> >> >> > >> (or
>>> >> >> > >> really close to the UTR)? What do you want to do with the
>>> >> >> > >> probesets
>>> >> >> > >> where
>>> >> >> > >> one or more of the probes binds in multiple positions in the
>>> >> >> > >> genome?
>>> >> >> > >> These
>>> >> >> > >> are all questions that the exonmap package tries to consider,
>>> >> >> > >> and
>>> >> >> > >> it
>>> >> >> > >> gets
>>> >> >> > >> really complicated. That's why Affy went with the Gene ST
>>> >> >> > >> chips -
>>> >> >> > >> they
>>> >> >> > >> unleashed the Exon chips on us and couldn't sell them because
>>> >> >> > >> people
>>> >> >> > >> were
>>> >> >> > >> saying WTF do I do with this thing?
>>> >> >> > >>
>>> >> >> > >> I don't think there is an easy or obvious answer to your
>>> >> >> > >> question.
>>> >> >> > >> If
>>> >> >> > >> you
>>> >> >> > >> were to come up with what you think are reasonable answers to
>>> >> >> > >> my
>>> >> >> > >> questions,
>>> >> >> > >> then it wouldn't be much work to extract the chr, start, end
>>> >> >> > >> from
>>> >> >> > >> the
>>> >> >> > >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g.,
>>> >> >> > >>  findOverlaps()) to decide what regions are being
>>> >> >> > >> interrogated,
>>> >> >> > >> and
>>> >> >> > >> annotate
>>> >> >> > >> from there.
>>> >> >> > >>
>>> >> >> > >> Best,
>>> >> >> > >>
>>> >> >> > >> Jim
>>> >> >> > >>
>>> >> >> > >>
>>> >> >> > >>
>>> >> >> > >>>
>>> >> >> > >>> I can import it as a AffyBatch and generate an ExpressionSet
>>> >> >> > >>> with
>>> >> >> > >>> the help
>>> >> >> > >>> of the Xmap/exonmap supplied CDF, but there is no annotation
>>> >> >> > >>> attached to
>>> >> >> > >>> it.
>>> >> >> > >>>
>>> >> >> > >>> OR
>>> >> >> > >>>
>>> >> >> > >>> I can import the CEL files with the "oligo" package as a Exon
>>> >> >> > >>> Array
>>> >> >> > >>> object
>>> >> >> > >>> and generate an ExpressionSet from it.
>>> >> >> > >>> However in that case it still have no annotation.
>>> >> >> > >>>
>>> >> >> > >>> Surprisingly on the Bioconductor website there are all
>>> >> >> > >>> packages
>>> >> >> > >>> needed to
>>> >> >> > >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work
>>> >> >> > >>> with
>>> >> >> > >>> Mouse
>>> >> >> > >>> Exon 1.0 ST arrays seems missing!
>>> >> >> > >>>
>>> >> >> > >>> What am I doing wrong here? Has someone else had such
>>> >> >> > >>> problems?
>>> >> >> > >>>
>>> >> >> > >>> Thanks in advance for your effort,
>>> >> >> > >>> Andreas
>>> >> >> > >>>
>>> >> >> > >>>        [[alternative HTML version deleted]]
>>> >> >> > >>>
>>> >> >> > >>> _______________________________________________
>>> >> >> > >>> Bioconductor mailing list
>>> >> >> > >>> Bioconductor at r-project.org
>>> >> >> > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> >> >> > >>> Search the archives:
>>> >> >> > >>>
>>> >> >> > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> >> >> > >>
>>> >> >> > >>
>>> >> >> > >> --
>>> >> >> > >> James W. MacDonald, M.S.
>>> >> >> > >> Biostatistician
>>> >> >> > >> University of Washington
>>> >> >> > >> Environmental and Occupational Health Sciences
>>> >> >> > >> 4225 Roosevelt Way NE, # 100
>>> >> >> > >> Seattle WA 98105-6099
>>> >> >> > >>
>>> >> >> > >>
>>> >> >> > >> _______________________________________________
>>> >> >> > >> Bioconductor mailing list
>>> >> >> > >> Bioconductor at r-project.org
>>> >> >> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> >> >> > >> Search the archives:
>>> >> >> > >>
>>> >> >> > >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> >> >> >
>>> >> >> > _______________________________________________
>>> >> >> > Bioconductor mailing list
>>> >> >> > Bioconductor at r-project.org
>>> >> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> >> >> > Search the archives:
>>> >> >> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> >> >
>>> >> >
>>> >
>>> >
>>
>>
>