[BioC] Analysis and annotation (full) of Affymetrix Mouse Exon 1.0 ST arrays
Benilton Carvalho
beniltoncarvalho at gmail.com
Wed Jun 13 21:37:39 CEST 2012
please correct the code below to:
eset = rma(raw, target='full') ## or 'core', 'extended' (whatever is available)
and if you want results at the exon level
eset = rma(raw, target='probeset')
featureData(eset) = getNetAffx(raw, 'probeset')
apologies for the mistake below.
b
On 13 June 2012 20:11, Benilton Carvalho <beniltoncarvalho at gmail.com> wrote:
> FWIW, remember that you can obtain the contents of the annotation
> files (the NA32 Affymetrix files) with:
>
> library(Biobase)
> library(oligo)
> raw = read.celfiles(list.celfiles())
> eset = rma(raw, target='transcript')
> featureData(eset) = getNetAffx(eset, 'transcript')
> head(fData(eset))
>
> b
>
> On 13 June 2012 15:47, James W. MacDonald <jmacdon at uw.edu> wrote:
>> Hi Andreas,
>>
>>
>> On 6/13/2012 3:14 AM, Andreas Heider wrote:
>>>
>>> Dear mailing list,
>>> I know this was on the list couple of times, and I think I read it all,
>>> but
>>> actually I still don't get it right. So here is my problem:
>>>
>>> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse Gene
>>> 1.0
>>> ST) in a similar fashion to eg. HG-U133 arrays.
>>> That means, I want to finally have it accessible as an ExpressionSet
>>> object
>>> with a right Bioconductor annotation assigned. This should include GENE
>>> SYMBOLS, RefSeq IDs and ENTREZ IDs.
>>
>>
>> The problem here is that you want to do something that AFAIK isn't easy to
>> do. The Gene ST arrays allow you to summarize all the probes that
>> interrogate a particular transcript (e.g., all the exon-level probesets are
>> collapsed to transcript level, and then you summarize). However, for the
>> Exon ST arrays that isn't the case, unless there is something in xps to
>> allow for that - I know next to nothing about that package, so Cristian
>> Stratowa will have to chime in if I am missing something.
>>
>> For the Exon chips, you are always summarizing at the same probeset level,
>> where there are <= 4 probes per probeset, and there can be any number of
>> probesets that interrogate a given exon. Lots of these probesets interrogate
>> regions that aren't even transcribed, according to current knowledge of the
>> genome. When you choose core, extended or full probesets, you are just
>> changing the number of probesets being used, not summarizing at a different
>> level as with the Gene ST chip.
>>
>> So when you say you want gene symbols, refseq ids and gene ids, what exactly
>> are you after? If a given probeset is in the intron of a gene do you want to
>> annotate it as being part of that gene? How about if it is in the UTR (or
>> really close to the UTR)? What do you want to do with the probesets where
>> one or more of the probes binds in multiple positions in the genome? These
>> are all questions that the exonmap package tries to consider, and it gets
>> really complicated. That's why Affy went with the Gene ST chips - they
>> unleashed the Exon chips on us and couldn't sell them because people were
>> saying WTF do I do with this thing?
>>
>> I don't think there is an easy or obvious answer to your question. If you
>> were to come up with what you think are reasonable answers to my questions,
>> then it wouldn't be much work to extract the chr, start, end from the
>> pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g.,
>> findOverlaps()) to decide what regions are being interrogated, and annotate
>> from there.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>>
>>> I can import it as a AffyBatch and generate an ExpressionSet with the help
>>> of the Xmap/exonmap supplied CDF, but there is no annotation attached to
>>> it.
>>>
>>> OR
>>>
>>> I can import the CEL files with the "oligo" package as a Exon Array object
>>> and generate an ExpressionSet from it.
>>> However in that case it still have no annotation.
>>>
>>> Surprisingly on the Bioconductor website there are all packages needed to
>>> deal with Mouse Gene 1.0 ST arrays but the informtion to work with Mouse
>>> Exon 1.0 ST arrays seems missing!
>>>
>>> What am I doing wrong here? Has someone else had such problems?
>>>
>>> Thanks in advance for your effort,
>>> Andreas
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list