[BioC] Analysis and annotation (full) of Affymetrix Mouse Exon 1.0 ST arrays

James Perkins jperkins at biochem.ucl.ac.uk
Wed Jun 27 17:49:31 CEST 2012


Great, Thanks, I'll look out for it!

And thanks a lot Andreas for the suggestion of using ensembl exon ids,
that sounds good, thanks for all your help.

Cheers!

Jim

On 27 June 2012 17:44, Benilton Carvalho <beniltoncarvalho at gmail.com> wrote:
> That's correct... the summarisation step does use the MPS... and I'll
> add support for our next release. b
>
> On 27 June 2012 16:37, James Perkins <jperkins at biochem.ucl.ac.uk> wrote:
>> Sorry, I meant at the rma(target=) level, not the getNetAffx level,
>> which I *assume* uses the mps files to map between ps and transcripts?
>>
>> Cheers,
>>
>> Jim
>>
>>
>> On 27 June 2012 17:27, Benilton Carvalho <beniltoncarvalho at gmail.com> wrote:
>>> Hi Jim,
>>>
>>> I'll make sure to add the comprehensive MPS as soon as I get more info
>>> about it from the specialists...
>>>
>>> However, note that the contents of the MPS files are not used by
>>> getNetAffx(), which only uses the probeset/transcript annotation
>>> file...
>>>
>>> Thanks,
>>>
>>> benilton
>>>
>>> On 27 June 2012 15:00, James Perkins <jperkins at biochem.ucl.ac.uk> wrote:
>>>> Hi,
>>>>
>>>> I wasn't sure if this was worth starting a new thread for this, since
>>>> my question is very much related to this thread...
>>>>
>>>> Is there any plan to include the "comprehensive" exon array mappings?
>>>>
>>>> E.g. for rat:
>>>>
>>>> If one goes here
>>>>
>>>> http://www.affymetrix.com/estore/browse/products.jsp?productId=131489&categoryId=35748&productName=GeneChip-Rat-Exon-1.0-ST-Array#1_1
>>>>
>>>> Then to Technical Documentation tab
>>>>
>>>> And downloads the
>>>>
>>>> "Rat Exon 1.0 ST Array Probeset, and Meta Probeset Files, core, full,
>>>> extended and comprehensive rn4" data
>>>>
>>>> http://www.affymetrix.com/Auth/support/downloads/library_files/RaEx-1_0-st-v1.r2.dt1.rn4.ps.zip
>>>>
>>>> There are the core/extended/full ps and mps files here.
>>>>
>>>> However there is also a comprehensive mps file.
>>>>
>>>> Full, core and extended are from 2006.
>>>>
>>>> The comprehensive is from 2010 (and gets updated more regularly), so
>>>> perhaps would be a better file to use for getNetAffx ?
>>>>
>>>> Apologies if this has been covered before. I am never sure of what is
>>>> the best way to analyse exon array data at the gene level.
>>>>
>>>> Thanks,
>>>>
>>>> Jim
>>>>
>>>>
>>>>
>>>>
>>>> On 13 June 2012 21:37, Benilton Carvalho <beniltoncarvalho at gmail.com> wrote:
>>>>>
>>>>> please correct the code below to:
>>>>>
>>>>> eset = rma(raw, target='full') ## or 'core', 'extended' (whatever is available)
>>>>>
>>>>> and if you want results at the exon level
>>>>>
>>>>> eset = rma(raw, target='probeset')
>>>>> featureData(eset) = getNetAffx(raw, 'probeset')
>>>>>
>>>>> apologies for the mistake below.
>>>>>
>>>>> b
>>>>>
>>>>> On 13 June 2012 20:11, Benilton Carvalho <beniltoncarvalho at gmail.com> wrote:
>>>>> > FWIW, remember that you can obtain the contents of the annotation
>>>>> > files (the NA32 Affymetrix files) with:
>>>>> >
>>>>> > library(Biobase)
>>>>> > library(oligo)
>>>>> > raw = read.celfiles(list.celfiles())
>>>>> > eset = rma(raw, target='transcript')
>>>>> > featureData(eset) = getNetAffx(eset, 'transcript')
>>>>> > head(fData(eset))
>>>>> >
>>>>> > b
>>>>> >
>>>>> > On 13 June 2012 15:47, James W. MacDonald <jmacdon at uw.edu> wrote:
>>>>> >> Hi Andreas,
>>>>> >>
>>>>> >>
>>>>> >> On 6/13/2012 3:14 AM, Andreas Heider wrote:
>>>>> >>>
>>>>> >>> Dear mailing list,
>>>>> >>> I know this was on the list couple of times, and I think I read it all,
>>>>> >>> but
>>>>> >>> actually I still don't get it right. So here is my problem:
>>>>> >>>
>>>>> >>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse Gene
>>>>> >>> 1.0
>>>>> >>> ST) in a similar fashion to eg. HG-U133 arrays.
>>>>> >>> That means, I want to finally have it accessible as an ExpressionSet
>>>>> >>> object
>>>>> >>> with a right Bioconductor annotation assigned. This should include GENE
>>>>> >>> SYMBOLS, RefSeq IDs and ENTREZ IDs.
>>>>> >>
>>>>> >>
>>>>> >> The problem here is that you want to do something that AFAIK isn't easy to
>>>>> >> do. The Gene ST arrays allow you to summarize all the probes that
>>>>> >> interrogate a particular transcript (e.g., all the exon-level probesets are
>>>>> >> collapsed to transcript level, and then you summarize). However, for the
>>>>> >> Exon ST arrays that isn't the case, unless there is something in xps to
>>>>> >> allow for that - I know next to nothing about that package, so Cristian
>>>>> >> Stratowa will have to chime in if I am missing something.
>>>>> >>
>>>>> >> For the Exon chips, you are always summarizing at the same probeset level,
>>>>> >> where there are <= 4 probes per probeset, and there can be any number of
>>>>> >> probesets that interrogate a given exon. Lots of these probesets interrogate
>>>>> >> regions that aren't even transcribed, according to current knowledge of the
>>>>> >> genome. When you choose core, extended or full probesets, you are just
>>>>> >> changing the number of probesets being used, not summarizing at a different
>>>>> >> level as with the Gene ST chip.
>>>>> >>
>>>>> >> So when you say you want gene symbols, refseq ids and gene ids, what exactly
>>>>> >> are you after? If a given probeset is in the intron of a gene do you want to
>>>>> >> annotate it as being part of that gene? How about if it is in the UTR (or
>>>>> >> really close to the UTR)? What do you want to do with the probesets where
>>>>> >> one or more of the probes binds in multiple positions in the genome? These
>>>>> >> are all questions that the exonmap package tries to consider, and it gets
>>>>> >> really complicated. That's why Affy went with the Gene ST chips - they
>>>>> >> unleashed the Exon chips on us and couldn't sell them because people were
>>>>> >> saying WTF do I do with this thing?
>>>>> >>
>>>>> >> I don't think there is an easy or obvious answer to your question. If you
>>>>> >> were to come up with what you think are reasonable answers to my questions,
>>>>> >> then it wouldn't be much work to extract the chr, start, end from the
>>>>> >> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g.,
>>>>> >>  findOverlaps()) to decide what regions are being interrogated, and annotate
>>>>> >> from there.
>>>>> >>
>>>>> >> Best,
>>>>> >>
>>>>> >> Jim
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>>
>>>>> >>> I can import it as a AffyBatch and generate an ExpressionSet with the help
>>>>> >>> of the Xmap/exonmap supplied CDF, but there is no annotation attached to
>>>>> >>> it.
>>>>> >>>
>>>>> >>> OR
>>>>> >>>
>>>>> >>> I can import the CEL files with the "oligo" package as a Exon Array object
>>>>> >>> and generate an ExpressionSet from it.
>>>>> >>> However in that case it still have no annotation.
>>>>> >>>
>>>>> >>> Surprisingly on the Bioconductor website there are all packages needed to
>>>>> >>> deal with Mouse Gene 1.0 ST arrays but the informtion to work with Mouse
>>>>> >>> Exon 1.0 ST arrays seems missing!
>>>>> >>>
>>>>> >>> What am I doing wrong here? Has someone else had such problems?
>>>>> >>>
>>>>> >>> Thanks in advance for your effort,
>>>>> >>> Andreas
>>>>> >>>
>>>>> >>>        [[alternative HTML version deleted]]
>>>>> >>>
>>>>> >>> _______________________________________________
>>>>> >>> Bioconductor mailing list
>>>>> >>> Bioconductor at r-project.org
>>>>> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> >>> Search the archives:
>>>>> >>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> James W. MacDonald, M.S.
>>>>> >> Biostatistician
>>>>> >> University of Washington
>>>>> >> Environmental and Occupational Health Sciences
>>>>> >> 4225 Roosevelt Way NE, # 100
>>>>> >> Seattle WA 98105-6099
>>>>> >>
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> Bioconductor mailing list
>>>>> >> Bioconductor at r-project.org
>>>>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> >> Search the archives:
>>>>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list