[BioC] Analysis of Affymetrix Mouse Gene 2.0 ST arrays
Naxerova, Kamila
naxerova at fas.harvard.edu
Wed Mar 6 16:17:04 CET 2013
Hi Jim,
thank you for your helpful reply. I have a few follow-up questions.
>
> I should throw in my obligatory cautionary statement about summarizing
> Gene ST data at the probeset (as compared to the transcript) level. If
> you look at the number of probes/probeset, there are a huge number with
> < 4 probes. So hypothetically you can do this, but I wouldn't.
I am bit confused about transcript clusters and probesets. In the MoGene-2_0-st-v1.na33.mm10.transcript.csv file, each transcript cluster corresponds to exactly one probe set. But from your email it sounds like there are more probesets than transcript clusters -- I assume these are stored in a different file? Unfortunately the structure of the Affymetrix web site is a mystery to me, without your direct link I would have never found the transcript annotation file, so I have no way of browsing and checking out other annotation files to better understand what is going on.
Why is there a distinction between transcript cluster and probeset in the first place? I understand that it's useful to be able to group probes dynamically (based on our state of knowledge about a locus). If this grouping is defined as the transcript cluster, what is the definition of a probeset?
Do I assume correctly that if I build my annotation using the MoGene-2_0-st-v1.na33.mm10.transcript.csvfile, I essentially commit to analyzing my data on the transcript level?
>
> library(AnnotationForge)
> library(mouse.db0)
> library(org.Mm.eg.db)
> makeDBPackage("MOUSECHIP_DB",
> affy=TRUE,
> prefix="mogene20sttranscriptcluster",
> fileName="MoGene-2_0-st-v1.na33.mm10.transcript.csv",
> outputDir = ".",
> version="2.11.1",
> manufacturer = "Affymetrix",
> chipName = "Human Gene 2.1 ST Array",
> manufacturerUrl = "http://www.affymetrix.com",
> author = "Kamila Naxerova",
> maintainer = "Kamila Naxerova <naxerova at fas.harvard.edu>")
>
>
Any thoughts on this error message?
> makeDBPackage("MOUSECHIP_DB",
+ affy=TRUE,
+ prefix="mogene20sttranscriptcluster",
+ fileName="MoGene-2_0-st-v1.na33.mm10.transcript.csv",
+ outputDir = ".",
+ version="2.11.1",
+ manufacturer = "Affymetrix",
+ chipName = "Mouse Gene 2.0 ST Array",
+ manufacturerUrl = "http://www.affymetrix.com",
+ author = "Kamila Naxerova",
+ maintainer = "Kamila Naxerova <naxerova at fas.harvard.edu>")
Error in `[.data.frame`(csvFile, , GenBankIDName) :
undefined columns selected
> sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] org.Mm.eg.db_2.8.0 mouse.db0_2.8.0 AnnotationForge_1.0.3 org.Hs.eg.db_2.8.0 RSQLite_0.11.2 DBI_0.2-5 AnnotationDbi_1.20.5 Biobase_2.18.0
[9] BiocGenerics_0.4.0 BiocInstaller_1.8.3
loaded via a namespace (and not attached):
[1] IRanges_1.16.6 parallel_2.15.3 stats4_2.15.3 tools_2.15.3
Many thanks!
Kamila
More information about the Bioconductor
mailing list