[BioC] Affymetrix Human Gene 1.0 ST Array

Tue May 13 18:36:55 CEST 2008

Dear Benilton,

thanks for your reponse.  I built a pdInfoPackage as suggested:

library(pdInfoBuilder)
pgfFile = "HuGene-1_0-st-v1.r3.pgf"
clfFile = "HuGene-1_0-st-v1.r3.clf"
probeFile = "HuGene-1_0-st-v1.probe.tab"
transFile = "HuGene-1_0-st-v1.na24.hg18.transcript.csv"
pkg <- new("AffyGenePDInfoPkgSeed",
           version="0.0.1",
           author="Hans-Ulrich Klein", email="h.klein at uni-muenster.de",
           biocViews="AnnotationData",
           genomebuild="hg18",
           pgfFile=pgfFile, clfFile=clfFile,
           probeFile=probeFile, transFile=transFile)
makePdInfoPackage(pkg, destDir=".")

Creating package in ./pd.hugene.1.0.st.v1
loadUnitsByBatch took 67.19 sec
loadAffyCsv took 9.10 sec
loadAffySeqCsv took 95.06 sec
DB sort, index creation took 32.30 sec
Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'

I installed the package and built an ExpressionSet using RMA normalized 
probeset values exported from Affymetrix "Expression Console". Currently 
I am using the package by sending SQL statements directly, e.g.:

 > library("pd.hugene.1.0.st.v1")
 > con = db(pd.hugene.1.0.st.v1)
 > dbListTables(con)
 [1] "featureSet"   "mmfeature"    "pm_mm"        "pmfeature"    
"qcmmfeature"
 [6] "qcpm_qcmm"    "qcpmfeature"  "sequence"     "sqlite_stat1" 
"table_info"
 > featureNames(eSet)[10000]
[1] "7973403"
 > res = dbSendQuery(con, "SELECT * FROM FeatureSet WHERE fsetid == 
7973403;")
 > table = fetch(res)
 > table$gene_assignment
[1] "NM_138460 // CMTM5 // CKLF-like MARVEL transmembrane domain 
containing 5 // 14q11.2 // 116173 /// NM_001037288 // CMTM5 // CKLF-like 
MARVEL transmembrane domain containing 5 // 14q11.2 // 116173 /// 
ENST00000359320 // CMTM5 // CKLF-like MARVEL transmembrane domain 
containing 5 (CMTM5), transcript variant 1, mRNA // 14q11.2 // 116173 
/// ENST00000382809 // CMTM5 // CKLF-like MARVEL transmembrane domain 
containing 5 (CMTM5), transcript variant 3, mRNA // 14q11.2 // 116173 
/// AF527413 // CMTM5 // CKLF-like MARVEL transmembrane domain 
containing 5 // 14q11.2 // 116173 /// AK094840 // CMTM5 // CKLF-like 
MARVEL transmembrane domain containing 5 // 14q11.2 // 116173"

This is OK for me at the moment, but it is laborious compared to classic 
annotation data packages (like "hgu95av2.db"). Is there a more 
convenient way to access annotation data?

Thanks in advance,
Hans-Ulrich