[BioC] AnnBuilder package: problem with basefile type of "refseq" or "gbNRef"
Luo Weijun
luo_weijun at yahoo.com
Mon Jul 24 18:03:38 CEST 2006
Hello,
Last time I got problem with basefile type of "ll"
(Entrez Gene), and John and Nianhua fixed it quickly.
And this time, I used basefile type of "refseq" or
"gbNRef", because my base file mapping was based on
Refseq IDs. I tried 2-column or 3-column base file.
And I got package built, but a lot annotation data are
missing, like Gene symbols, gene names, Unigene IDs,
etc, and all the present annotation has no entries at
all. My feeling is we might have some similar problems
for Refseq IDs as last time for Entrez Gene IDs, but I
am not positive about this. Please let me know if you
have any ideas/suggestions. Thank you so much.
Weijun
The codes are almost the same as last time, except the
difference related to base file.
library(AnnBuilder)
load('/Users/luow/project/microarraydata/annotation/hs95av2Refseq7.Rdata')
myBase=cbind(hs95av2Refseq7,hs95av2Refseq7)
myBase[,2]=unlist(strsplit(myBase[,2],'_at'))
write.table(myBase,file='/Users/luow/project/microarraydata/annotation/hs95av2Refseq7Base.txt',sep='\t',row.names=F,col.names=F)
myBase='/Users/luow/project/microarraydata/annotation/hs95av2Refseq7Base.txt'
myBaseType <-"refseq"
#myBaseType <-"gbNRef"
myDir <-
'/Users/luow/project/microarraydata/annotation/'
ABPkgBuilder(baseName = myBase, baseMapType =
myBaseType,
pkgName = "hs95av2Refseq7", pkgPath = myDir,
organism = "Homo sapiens", version = "1.1.0", author =
list(authors = "Weijun",
maintainer = "Weijun <luo_weijun at yahoo.com>"), fromWeb
=T)
here is part of my base file
> a=
read.delim('/Users/luow/project/microarraydata/annotation/hs95av2Refseq7Base.txt',sep='\t',head=F)[100:110,]
> a
V1 V2
100 NM_000049_at NM_000049
101 NM_000050_at NM_000050
102 NM_000051_at NM_000051
103 NM_000053_at NM_000053
104 NM_000054_at NM_000054
105 NM_000055_at NM_000055
106 NM_000056_at NM_000056
107 NM_000057_at NM_000057
108 NM_000059_at NM_000059
109 NM_000060_at NM_000060
110 NM_000061_at NM_000061
>
Here is what I got after I install the package
> library(hs95av2Refseq7)
> hs95av2Refseq7()
Quality control information for hs95av2Refseq7
Date built: Created: Sun Jul 23 14:52:11 2006
Number of probes: 12548
Probe number missmatch: hs95av2Refseq7ACCNUM;
hs95av2Refseq7CHRLOC; hs95av2Refseq7ENZYME;
hs95av2Refseq7LOCUSID; hs95av2Refseq7PATH
Probe missmatch: None
Mappings found for probe based rda files:
hs95av2Refseq7ACCNUM found 125485 of 12548
hs95av2Refseq7CHRLOC found 0 of 12548
hs95av2Refseq7ENZYME found 0 of 12548
hs95av2Refseq7LOCUSID found 0 of 12548
hs95av2Refseq7PATH found 0 of 12548
Mappings found for non-probe based rda files:
hs95av2Refseq7CHRLENGTHS found 25
hs95av2Refseq7ORGANISM found 1
hs95av2Refseq7PFAM found 0
hs95av2Refseq7PROSITE found 0
Here is my environment, noticed that AnnBuilder 1.11.4
works fine with R 2.3.1, and I got good package built
with base file mapping of Entrez Gene IDs under the
same environment.
> sessionInfo()
Version 2.3.1 (2006-06-01)
powerpc-apple-darwin8.6.0
attached base packages:
[1] "tools" "methods" "stats" "graphics"
"grDevices" "utils"
[7] "datasets" "base"
other attached packages:
AnnBuilder RSQLite DBI annotate XML
Biobase
"1.11.4" "0.4-1" "0.1-10" "1.10.0" "0.99-7"
"1.10.0"
>
More information about the Bioconductor
mailing list