[BioC] AnnBuilder to build annotation package for hs133phsentrezg

Marc Carlson mcarlson at fhcrc.org
Thu Jan 10 23:43:46 CET 2008


Tineke Casneuf wrote:
> Dear list,
>
> I am trying to build an Annotation package for the hgu133plus2 array (Affy
> array, human). For my data analysis I have been using the remapped
> hs133phsentrezgcdf package. An annotation package exists for this CDF but
> the contained information is very limited. So I decided to build  my own,
> using AnnBuilder.
> Here's the code I used:
>
> #####
>   library(AnnBuilder)
>   library(hs133phsentrezgcdf)
>   fN <- ls(hs133phsentrezgcdf)   # to extract the featureNames in this CDF
>   cfN <- sub("_at", "", fN)     #  to obtain the entrez gene IDs
>
> ## construct the base file
>   mygeneNMap <- matrix(c(fN, cfN), byrow=F, ncol =2)
>   write.table(mygeneNMap, file = "geneNMap", sep = "\t", quote = FALSE,
> row.names = FALSE, col.names = FALSE)
>
> ## retrieve where to go look for the info
> mySrcUrls <- getSrcUrl("all", "Homo sapiens")
>
> myDir <- "."
>
>   mySrcUrls <- getSrcUrl("all", "Homo sapiens")
>
>  ABPkgBuilder(baseName = "geneNMap", srcUrls = mySrcUrls,
>    baseMapType  = "ll", pkgName = "hs133phsentrezgannot",
>    pkgPath = ".", organism =  "Homo sapiens", version = "1.1.0",
>    author = list(authors = "Tine Casneuf",  maintainer =
>    "Tine, <tineke.casneuf at gmail.com"), fromWeb = TRUE)
> ######
>
> My baseName file has the probeset IDs in the first column, the Entrez Gene
> ID in the second and thus looks like this:
>  > read.table(file = "geneNMap", sep ="\t")[1:4,]
>         V1   V2
>  1    1_at    1
>  2   10_at   10
>  3  100_at  100
>  4 1000_at 1000
>
> The  ABPkgBuilder function runs without errors or significant warnings. The
> package build and can be installed but no mapping could be done for my data,
> as you can see below:
>
>   
>> hs133phsentrezgannot()
>>     
>
> Quality control information for  hs133phsentrezgannot
> Date built: Created: Thu Jan 10 11:55:27 2008
>
> Number of probes: 17589
> Probe number missmatch: None
> Probe missmatch: None
> Mappings found for probe based rda files:
>          hs133phsentrezgannotCHRLOC found 0 of 17589
>          hs133phsentrezgannotENTREZID found 0 of 17589
>          hs133phsentrezgannotENZYME found 0 of 17589
>          hs133phsentrezgannotPATH found 0 of 17589
> Mappings found for non-probe based rda files:
>          hs133phsentrezgannotCHRLENGTHS found 25
>          hs133phsentrezgannotORGANISM found 1
>          hs133phsentrezgannotPFAM found 0
>          hs133phsentrezgannotPROSITE found 0
> Does anyone have a clue what I am doing wrong? It will probably be something
> small, but I cannot figure it out.
>
> Many thanks in advance!
> Best,
> Tine
>
> #####
> My sessionInfo:
>   
>> sessionInfo()
>>     
> R version 2.6.1 (2007-11-26)
> i386-pc-mingw32
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> attached base packages:
> [1] tools     stats     graphics  grDevices utils     datasets  methods
> base
> other attached packages:
>  [1] hs133phsentrezgannot_1.1.0  GO_2.0.1
> hs133phsentrezgprobe_10.0.0 matchprobes_1.10.0
>  [5] hs133phsentrezgcdf_10.0.0   AnnBuilder_1.16.0
> annotate_1.16.1             xtable_1.5-2
>  [9] AnnotationDbi_1.0.6         RSQLite_0.6-4
> DBI_0.2-4                   XML_1.93-2.1
> [13] affy_1.16.0                 preprocessCore_1.0.0
> affyio_1.6.1                Biobase_1.16.2
> loaded via a namespace (and not attached):
> [1] rcompgen_0.1-17
>   
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>   
# I did the following and got it to work:

#1st I started with your code to put the IDs into a file. 
#(Please note that I am just assuming that what you are doing for this
part is ok)
library(AnnBuilder)
library(hs133phsentrezgcdf)
fN <- ls(hs133phsentrezgcdf)   # to extract the featureNames in this CDF
cfN <- sub("_at", "", fN)     #  to obtain the entrez gene IDs

## construct the base file
mygeneNMap <- matrix(c(fN, cfN), byrow=F, ncol =2)
write.table(mygeneNMap, file = "geneNMap", sep = "\t", quote = FALSE,
row.names = FALSE, col.names = FALSE)

mySrcUrls <- getSrcUrl("all", "Homo sapiens")

#Your file of IDs looks like the right format for AnnBuilder at this point.

#So I called ABPkgBuilder() like this:
ABPkgBuilder(baseName="/home/mcarlson/tasks/tineke/geneNMap",
srcUrls=mySrcUrls,baseMapType="ll",pkgName="chipFoo",pkgPath="/home/mcarlson/tasks/tineke",organism="Homo
sapiens",
version="1.0.0",author=list(author="Joe",maintainer="<joe.joe at gmail.com>"),fromWeb
= TRUE)


#Here is my sessionInfo() Where I have made sure to try and use the same
version of R...
sessionInfo()
R version 2.6.1 Patched (2008-01-09 r43930)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=en_US;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] GO_2.0.1                  hs133phsentrezgcdf_10.0.0
 [3] AnnBuilder_1.16.0         annotate_1.16.1
 [5] xtable_1.5-2              AnnotationDbi_1.0.6
 [7] RSQLite_0.6-4             DBI_0.2-4
 [9] XML_1.93-2                Biobase_1.16.2

loaded via a namespace (and not attached):
[1] rcompgen_0.1-17



I hope this helps,

    Marc



More information about the Bioconductor mailing list