[BioC] AnnBuilder to build annotation package for hs133phsentrezg
Marc Carlson
mcarlson at fhcrc.org
Thu Jan 10 23:43:46 CET 2008
Tineke Casneuf wrote:
> Dear list,
>
> I am trying to build an Annotation package for the hgu133plus2 array (Affy
> array, human). For my data analysis I have been using the remapped
> hs133phsentrezgcdf package. An annotation package exists for this CDF but
> the contained information is very limited. So I decided to build my own,
> using AnnBuilder.
> Here's the code I used:
>
> #####
> library(AnnBuilder)
> library(hs133phsentrezgcdf)
> fN <- ls(hs133phsentrezgcdf) # to extract the featureNames in this CDF
> cfN <- sub("_at", "", fN) # to obtain the entrez gene IDs
>
> ## construct the base file
> mygeneNMap <- matrix(c(fN, cfN), byrow=F, ncol =2)
> write.table(mygeneNMap, file = "geneNMap", sep = "\t", quote = FALSE,
> row.names = FALSE, col.names = FALSE)
>
> ## retrieve where to go look for the info
> mySrcUrls <- getSrcUrl("all", "Homo sapiens")
>
> myDir <- "."
>
> mySrcUrls <- getSrcUrl("all", "Homo sapiens")
>
> ABPkgBuilder(baseName = "geneNMap", srcUrls = mySrcUrls,
> baseMapType = "ll", pkgName = "hs133phsentrezgannot",
> pkgPath = ".", organism = "Homo sapiens", version = "1.1.0",
> author = list(authors = "Tine Casneuf", maintainer =
> "Tine, <tineke.casneuf at gmail.com"), fromWeb = TRUE)
> ######
>
> My baseName file has the probeset IDs in the first column, the Entrez Gene
> ID in the second and thus looks like this:
> > read.table(file = "geneNMap", sep ="\t")[1:4,]
> V1 V2
> 1 1_at 1
> 2 10_at 10
> 3 100_at 100
> 4 1000_at 1000
>
> The ABPkgBuilder function runs without errors or significant warnings. The
> package build and can be installed but no mapping could be done for my data,
> as you can see below:
>
>
>> hs133phsentrezgannot()
>>
>
> Quality control information for hs133phsentrezgannot
> Date built: Created: Thu Jan 10 11:55:27 2008
>
> Number of probes: 17589
> Probe number missmatch: None
> Probe missmatch: None
> Mappings found for probe based rda files:
> hs133phsentrezgannotCHRLOC found 0 of 17589
> hs133phsentrezgannotENTREZID found 0 of 17589
> hs133phsentrezgannotENZYME found 0 of 17589
> hs133phsentrezgannotPATH found 0 of 17589
> Mappings found for non-probe based rda files:
> hs133phsentrezgannotCHRLENGTHS found 25
> hs133phsentrezgannotORGANISM found 1
> hs133phsentrezgannotPFAM found 0
> hs133phsentrezgannotPROSITE found 0
> Does anyone have a clue what I am doing wrong? It will probably be something
> small, but I cannot figure it out.
>
> Many thanks in advance!
> Best,
> Tine
>
> #####
> My sessionInfo:
>
>> sessionInfo()
>>
> R version 2.6.1 (2007-11-26)
> i386-pc-mingw32
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> attached base packages:
> [1] tools stats graphics grDevices utils datasets methods
> base
> other attached packages:
> [1] hs133phsentrezgannot_1.1.0 GO_2.0.1
> hs133phsentrezgprobe_10.0.0 matchprobes_1.10.0
> [5] hs133phsentrezgcdf_10.0.0 AnnBuilder_1.16.0
> annotate_1.16.1 xtable_1.5-2
> [9] AnnotationDbi_1.0.6 RSQLite_0.6-4
> DBI_0.2-4 XML_1.93-2.1
> [13] affy_1.16.0 preprocessCore_1.0.0
> affyio_1.6.1 Biobase_1.16.2
> loaded via a namespace (and not attached):
> [1] rcompgen_0.1-17
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
# I did the following and got it to work:
#1st I started with your code to put the IDs into a file.
#(Please note that I am just assuming that what you are doing for this
part is ok)
library(AnnBuilder)
library(hs133phsentrezgcdf)
fN <- ls(hs133phsentrezgcdf) # to extract the featureNames in this CDF
cfN <- sub("_at", "", fN) # to obtain the entrez gene IDs
## construct the base file
mygeneNMap <- matrix(c(fN, cfN), byrow=F, ncol =2)
write.table(mygeneNMap, file = "geneNMap", sep = "\t", quote = FALSE,
row.names = FALSE, col.names = FALSE)
mySrcUrls <- getSrcUrl("all", "Homo sapiens")
#Your file of IDs looks like the right format for AnnBuilder at this point.
#So I called ABPkgBuilder() like this:
ABPkgBuilder(baseName="/home/mcarlson/tasks/tineke/geneNMap",
srcUrls=mySrcUrls,baseMapType="ll",pkgName="chipFoo",pkgPath="/home/mcarlson/tasks/tineke",organism="Homo
sapiens",
version="1.0.0",author=list(author="Joe",maintainer="<joe.joe at gmail.com>"),fromWeb
= TRUE)
#Here is my sessionInfo() Where I have made sure to try and use the same
version of R...
sessionInfo()
R version 2.6.1 Patched (2008-01-09 r43930)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=en_US;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C
attached base packages:
[1] tools stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GO_2.0.1 hs133phsentrezgcdf_10.0.0
[3] AnnBuilder_1.16.0 annotate_1.16.1
[5] xtable_1.5-2 AnnotationDbi_1.0.6
[7] RSQLite_0.6-4 DBI_0.2-4
[9] XML_1.93-2 Biobase_1.16.2
loaded via a namespace (and not attached):
[1] rcompgen_0.1-17
I hope this helps,
Marc
More information about the Bioconductor
mailing list