[BioC] Analysis of Affymetrix Mouse Gene 2.0 ST arrays

Wed Mar 6 16:17:04 CET 2013

Hi Jim,

thank you for your helpful reply. I have a few follow-up questions.
> 
> I should throw in my obligatory cautionary statement about summarizing 
> Gene ST data at the probeset (as compared to the transcript) level. If 
> you look at the number of probes/probeset, there are a huge number with 
> < 4 probes. So hypothetically you can do this, but I wouldn't.

I am bit confused about transcript clusters and probesets. In the MoGene-2_0-st-v1.na33.mm10.transcript.csv file, each transcript cluster corresponds to exactly one probe set. But from your email it sounds like there are more probesets than transcript clusters -- I assume these are stored in a different file? Unfortunately the structure of the Affymetrix web site is a mystery to me, without your direct link I would have never found the transcript annotation file, so I have no way of browsing and checking out other annotation files to better understand what is going on.

Why is there a distinction between transcript cluster and probeset in the first place? I understand that it's useful to be able to group probes dynamically (based on our state of knowledge about a locus). If this grouping is defined as the transcript cluster, what is the definition of a probeset?   

Do I assume correctly that if I build my annotation using the MoGene-2_0-st-v1.na33.mm10.transcript.csvfile,  I essentially commit to analyzing my data on the transcript level?
> 
> library(AnnotationForge)
> library(mouse.db0)
> library(org.Mm.eg.db)
> makeDBPackage("MOUSECHIP_DB",
> affy=TRUE,
> prefix="mogene20sttranscriptcluster",
> fileName="MoGene-2_0-st-v1.na33.mm10.transcript.csv",
> outputDir = ".",
> version="2.11.1",
> manufacturer = "Affymetrix",
> chipName = "Human Gene 2.1 ST Array",
> manufacturerUrl = "http://www.affymetrix.com",
> author = "Kamila Naxerova",
> maintainer = "Kamila Naxerova <naxerova at fas.harvard.edu>")
> 
> 

Any thoughts on this error message?

> makeDBPackage("MOUSECHIP_DB",
+ affy=TRUE,
+ prefix="mogene20sttranscriptcluster",
+ fileName="MoGene-2_0-st-v1.na33.mm10.transcript.csv",
+ outputDir = ".",
+ version="2.11.1",
+ manufacturer = "Affymetrix",
+ chipName = "Mouse Gene 2.0 ST Array",
+ manufacturerUrl = "http://www.affymetrix.com",
+ author = "Kamila Naxerova",
+ maintainer = "Kamila Naxerova <naxerova at fas.harvard.edu>")
Error in `[.data.frame`(csvFile, , GenBankIDName) : 
  undefined columns selected

> sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] org.Mm.eg.db_2.8.0    mouse.db0_2.8.0       AnnotationForge_1.0.3 org.Hs.eg.db_2.8.0    RSQLite_0.11.2        DBI_0.2-5             AnnotationDbi_1.20.5  Biobase_2.18.0       
 [9] BiocGenerics_0.4.0    BiocInstaller_1.8.3  

loaded via a namespace (and not attached):
[1] IRanges_1.16.6  parallel_2.15.3 stats4_2.15.3   tools_2.15.3   

Many thanks!
Kamila