[BioC] Analysis and annotation (full) of Affymetrix Mouse Exon 1.0 ST arrays

Wed Jun 13 16:47:03 CEST 2012

Hi Andreas,

On 6/13/2012 3:14 AM, Andreas Heider wrote:
> Dear mailing list,
> I know this was on the list couple of times, and I think I read it all, but
> actually I still don't get it right. So here is my problem:
>
> I want to be able to work with Mouse Exon 1.0 ST Arrays (NOT Mouse Gene 1.0
> ST) in a similar fashion to eg. HG-U133 arrays.
> That means, I want to finally have it accessible as an ExpressionSet object
> with a right Bioconductor annotation assigned. This should include GENE
> SYMBOLS, RefSeq IDs and ENTREZ IDs.

The problem here is that you want to do something that AFAIK isn't easy 
to do. The Gene ST arrays allow you to summarize all the probes that 
interrogate a particular transcript (e.g., all the exon-level probesets 
are collapsed to transcript level, and then you summarize). However, for 
the Exon ST arrays that isn't the case, unless there is something in xps 
to allow for that - I know next to nothing about that package, so 
Cristian Stratowa will have to chime in if I am missing something.

For the Exon chips, you are always summarizing at the same probeset 
level, where there are <= 4 probes per probeset, and there can be any 
number of probesets that interrogate a given exon. Lots of these 
probesets interrogate regions that aren't even transcribed, according to 
current knowledge of the genome. When you choose core, extended or full 
probesets, you are just changing the number of probesets being used, not 
summarizing at a different level as with the Gene ST chip.

So when you say you want gene symbols, refseq ids and gene ids, what 
exactly are you after? If a given probeset is in the intron of a gene do 
you want to annotate it as being part of that gene? How about if it is 
in the UTR (or really close to the UTR)? What do you want to do with the 
probesets where one or more of the probes binds in multiple positions in 
the genome? These are all questions that the exonmap package tries to 
consider, and it gets really complicated. That's why Affy went with the 
Gene ST chips - they unleashed the Exon chips on us and couldn't sell 
them because people were saying WTF do I do with this thing?

I don't think there is an easy or obvious answer to your question. If 
you were to come up with what you think are reasonable answers to my 
questions, then it wouldn't be much work to extract the chr, start, end 
from the pd.moex.1.0.st.v1 package, and then use GenomicFeatures (e.g.,  
findOverlaps()) to decide what regions are being interrogated, and 
annotate from there.

Best,

Jim

>
> I can import it as a AffyBatch and generate an ExpressionSet with the help
> of the Xmap/exonmap supplied CDF, but there is no annotation attached to it.
>
> OR
>
> I can import the CEL files with the "oligo" package as a Exon Array object
> and generate an ExpressionSet from it.
> However in that case it still have no annotation.
>
> Surprisingly on the Bioconductor website there are all packages needed to
> deal with Mouse Gene 1.0 ST arrays but the informtion to work with Mouse
> Exon 1.0 ST arrays seems missing!
>
> What am I doing wrong here? Has someone else had such problems?
>
> Thanks in advance for your effort,
> Andreas
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099