[BioC] Analysis of Affymetrix Mouse Gene 2.0 ST arrays

Thu Mar 7 00:16:19 CET 2013

Dear Christian and Jim,

many thanks to both of you for your explanations. 

Your hard work paid off, and I have finally understood everything and managed to build my annotation package!!!! I wrote a little script similar to what Jim was suggesting, namely picking the first RefSeq-like thing I came across. Jim called it "naive" -- but I think there is no downside to this approach, right? I have looked at various examples in the Affy file for a long time, and simply picking the first Refseq ID seems to be kosher.

data <-read.csv("MoGene-transcript-noheader.csv",header=T,stringsAsFactors=F,sep=",")
sdata <- data[,c(1,9)]

returnRef=function(x){
	refst <- strsplit(x,split="///")[[1]][grep("RefSeq",strsplit(x,split="///")[[1]])[1]]
	refid <- gsub(" ","",strsplit(refst,split="//")[[1]][1])
	return(refid)
}

sdata$refseqids <- sapply(sdata[,2],returnRef)
fdata <- sdata[,-2]
write.table(fdata,"AnnotBuild.txt", sep="\t",quote=F,row.names=F,col.names=F)

library(AnnotationForge)
library(mouse.db0)
library(org.Mm.eg.db)
makeDBPackage("MOUSECHIP_DB",
affy=F,
prefix="mogene20sttranscriptcluster",
fileName="AnnotBuild.txt",
outputDir = ".",
version="2.11.1",
baseMapType="refseq",
manufacturer = "Affymetrix",
chipName = "Mouse Gene 2.0 ST Array",
manufacturerUrl = "http://www.affymetrix.com",
author = "Kamila Naxerova",
maintainer = "Kamila Naxerova <naxerova at fas.harvard.edu>")

> install.packages("mogene20sttranscriptcluster.db",repos=NULL, type="source")
* installing *source* package ‘mogene20sttranscriptcluster.db’ ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x86_64

* DONE (mogene20sttranscriptcluster.db)