[BioC] Analysis of Affymetrix Mouse Gene 2.0 ST arrays

Thu Mar 7 17:19:21 CET 2013

And I should mention that you need the transFile argument as well, which 
will be the

/Users/naxerova/Documents/xxx/MoGene-2_0-st-v1.na33.mm10.transcript.csv

file that you used to create the mogene20sttranscriptcluster.db file.

Best,

Jim

On 3/7/2013 11:03 AM, James W. MacDonald wrote:
> Wow. This is really an education on the vast unwashed underbelly of 
> BioC, no?
>
> There is a file called MoGene-2_0-st.mps that came in the zip file you 
> downloaded. Add
>
> mps <- list.files(baseDir, pattern = "mps$", full.names = TRUE)
>
> and then
>
> coreMps = mps
>
> when you create your AffyGenePDInfoPkgSeed. This file is used to 
> distinguish between the probeset and transcript probe mappings.
>
> Best,
>
> Jim
>
>
>
> On 3/7/2013 10:36 AM, Naxerova, Kamila wrote:
>> Thanks Jim. Of course the annotation package does not contain probe 
>> -->  probe set information. What was I thinking?!??
>>
>> What I had not realized was that I needed to build the 
>> pd.mogene.2.0.st package myself first, because it also does not exist 
>> on Bioconductor. So I just downloaded all the required files from 
>> Affy, but again I am stuck with an error message I don't 
>> understand... what is the coreMPS file that gives me the error?
>>
>>> library(pdInfoBuilder)
>>> baseDir<- "/Users/naxerova/Documents/xxx"
>>> (pgf<- list.files(baseDir, pattern = ".pgf",
>> + full.names = TRUE))
>> [1] "/Users/naxerova/Documents/xxx/MoGene-2_0-st.pgf"
>>> (clf<- list.files(baseDir, pattern = ".clf",
>> + full.names = TRUE))
>> [1] "/Users/naxerova/Documents/xxx/MoGene-2_0-st.clf"
>>> (prob<- list.files(baseDir, pattern = ".probeset.csv",
>> + full.names = TRUE))
>> [1] 
>> "/Users/naxerova/Documents/xxx/MoGene-2_0-st-v1.na33.mm10.probeset.csv"
>>> seed<- new("AffyGenePDInfoPkgSeed",
>> + pgfFile = pgf, clfFile = clf,
>> + probeFile = prob, author = "Kamila Naxerova",
>> + email = "naxerova at fas.harvard.edu",
>> + biocViews = "AnnotationData",
>> + organism = "Mouse", species = "Mus Musculus")
>>> makePdInfoPackage(seed, destDir = ".")
>> =============================================================================================================================================== 
>>
>> Building annotation package for Affymetrix Gene ST Array
>> PGF.........: MoGene-2_0-st.pgf
>> CLF.........: MoGene-2_0-st.clf
>> Probeset....: MoGene-2_0-st-v1.na33.mm10.probeset.csv
>> Transcript..: TheTranscriptFile
>> Core MPS....: coreMps
>> =============================================================================================================================================== 
>>
>> Parsing file: MoGene-2_0-st.pgf... OK
>> Parsing file: MoGene-2_0-st.clf... OK
>> Creating initial table for probes... OK
>> Creating dictionaries... OK
>> Parsing file: MoGene-2_0-st-v1.na33.mm10.probeset.csv... OK
>> Parsing file: coreMps... Error in file(file, "rt") : cannot open the 
>> connection
>> In addition: Warning message:
>> In file(file, "rt") : cannot open file 'coreMps': No such file or 
>> directory
>>
>>
>>
>>
>>
>> On Mar 7, 2013, at 10:06 AM, "James W. MacDonald"<jmacdon at uw.edu>  
>> wrote:
>>
>>> Hi Kamila,
>>>
>>> On 3/7/2013 9:54 AM, Naxerova, Kamila wrote:
>>>> Dear all,
>>>>
>>>> I am afraid I have to ask for help with the Mouse Gene 2.0 ST 
>>>> annotation package one more time. It looked like I created it 
>>>> successfully, but when I try to use it to read in cel files with 
>>>> the oligo package, I get a cryptic error message. Any suggestions 
>>>> would be much appreciated!
>>> You don't use the annotation package at this step. There are two
>>> packages that are used for the analysis of this chip type. The first is
>>> the pd.mogene.2.0.st.v1 package, which is used by oligo to map 
>>> probes to
>>> probesets when doing the normalization/summarization step. This package
>>> will be automagically installed if you don't have it, so there is
>>> nothing to be done at the first step but
>>>
>>> abatch<- read.celfiles(list.celfiles())
>>> eset<- rma(abatch)
>>>
>>> This will give you the summarized and normalized data at the transcript
>>> level. You then will normally fit some model(s) using the modeling
>>> package of your choice, and then might want to output a set of
>>> significant genes, at which time you will use the
>>> mogene20sttranscriptcluster.db package to map probeset IDs to gene
>>> information.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>>> abatch<- 
>>>>> read.celfiles(list.celfiles(),pkgname="mogene20sttranscriptcluster.db") 
>>>>>
>>>> Platform design info loaded.
>>>> Reading in : xxx.CEL
>>>> Reading in : xxx.CEL
>>>> Reading in : xxx.CEL
>>>> [... more cel files listed]
>>>>
>>>> Error in function (classes, fdef, mtable)  :
>>>>    unable to find an inherited method for function ‘kind’ for 
>>>> signature ‘"ChipDb"’
>>>>
>>>> Thanks
>>>> Kamila
>>>>
>>>> On Mar 6, 2013, at 6:16 PM, "Naxerova, 
>>>> Kamila"<naxerova at fas.harvard.edu>   wrote:
>>>>
>>>>> Dear Christian and Jim,
>>>>>
>>>>> many thanks to both of you for your explanations.
>>>>>
>>>>> Your hard work paid off, and I have finally understood everything 
>>>>> and managed to build my annotation package!!!! I wrote a little 
>>>>> script similar to what Jim was suggesting, namely picking the 
>>>>> first RefSeq-like thing I came across. Jim called it "naive" -- 
>>>>> but I think there is no downside to this approach, right? I have 
>>>>> looked at various examples in the Affy file for a long time, and 
>>>>> simply picking the first Refseq ID seems to be kosher.
>>>>>
>>>>> data<-read.csv("MoGene-transcript-noheader.csv",header=T,stringsAsFactors=F,sep=",") 
>>>>>
>>>>> sdata<- data[,c(1,9)]
>>>>>
>>>>> returnRef=function(x){
>>>>>     refst<- 
>>>>> strsplit(x,split="///")[[1]][grep("RefSeq",strsplit(x,split="///")[[1]])[1]]
>>>>>     refid<- gsub(" ","",strsplit(refst,split="//")[[1]][1])
>>>>>     return(refid)
>>>>> }
>>>>>
>>>>> sdata$refseqids<- sapply(sdata[,2],returnRef)
>>>>> fdata<- sdata[,-2]
>>>>> write.table(fdata,"AnnotBuild.txt", 
>>>>> sep="\t",quote=F,row.names=F,col.names=F)
>>>>>
>>>>> library(AnnotationForge)
>>>>> library(mouse.db0)
>>>>> library(org.Mm.eg.db)
>>>>> makeDBPackage("MOUSECHIP_DB",
>>>>> affy=F,
>>>>> prefix="mogene20sttranscriptcluster",
>>>>> fileName="AnnotBuild.txt",
>>>>> outputDir = ".",
>>>>> version="2.11.1",
>>>>> baseMapType="refseq",
>>>>> manufacturer = "Affymetrix",
>>>>> chipName = "Mouse Gene 2.0 ST Array",
>>>>> manufacturerUrl = "http://www.affymetrix.com",
>>>>> author = "Kamila Naxerova",
>>>>> maintainer = "Kamila Naxerova<naxerova at fas.harvard.edu>")
>>>>>
>>>>>> install.packages("mogene20sttranscriptcluster.db",repos=NULL, 
>>>>>> type="source")
>>>>> * installing *source* package ‘mogene20sttranscriptcluster.db’ ...
>>>>> ** R
>>>>> ** inst
>>>>> ** preparing package for lazy loading
>>>>> ** help
>>>>> *** installing help indices
>>>>> ** building package indices
>>>>> ** testing if installed package can be loaded
>>>>> *** arch - i386
>>>>> *** arch - x86_64
>>>>>
>>>>> * DONE (mogene20sttranscriptcluster.db)
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives: 
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: 
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> -- 
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099