[BioC] AnnBuilders paseData() doesn't recognize ACCs with underscore?

John Zhang jzhang at jimmy.harvard.edu
Thu Jan 18 17:58:14 CET 2007

>Your right, my problem is bound to the mix of accession and RefSeq Ids so
>being correct gbUGParser wouldn't be expectd to find the refseqs (my
>description of "accessions including underscores" was pretty dopey, I
>admitt). I just, probably in an attack of wild speculation, thought the "gb"
>scipts would automatically include the refseqs because there are no
>REF2xxxParsers and the gbNRef2LLParser is the only parser with refseq on the
>input side (as far as I can remember).The gbNRef2LLParser returns LocusLink
>Ids but I would like to match unigene ids and there seems to be no
>So probably I should rename a copy of the gbUGParser to "gbNREF2UGParser"
>and add the "_" to regular expression.

Yes, you can always write your own parsers to meet special requirements.

>-----Ursprüngliche Nachricht-----
>Von: John Zhang [mailto:jzhang at jimmy.harvard.edu] 
>Gesendet: 17 January 2007 15:12
>An: bioconductor at stat.math.ethz.ch; b.otto at uke.uni-hamburg.de
>Betreff: Re: [BioC] AnnBuilders paseData() doesn't recognize ACCs with
>>parseData() seems to have problems in recognition of accession numbers 
>>including an underscore like "NM_001815". The function just doesn't 
>>find them although they do exist in the database file.
>You have used a wrong parser. There are parsers, such as egRefseqParser and
>gbNRef2LLParser, that handles RefSeq ids with undersores. You need to pick
>one that fits your data. 
>>Here is the example I'm trying to get working:
>>>pkgpath <- .find.package("AnnBuilder")
>>># unigene infos
>>>ugUrl <- "C:/Programme/R/R-2.4.1/library/AnnBuilder/data/Ths.data"
>>># parsing
>>>ug <- UG(srcUrl = ugUrl, parser = file.path(pkgpath,
>>>"scripts", "gbUGParser"), baseFile = "geneNMap",
>>>organism = "Homo sapiens", built = "N/A", fromWeb = FALSE)
>>The geneNMap file has the entries:
>>32468_f_at	D90278;M16652
>>32469_at	L00693
>>NM_001815	NM_001815
>>BF897514	BF897514
>>38912_at	D90042
>>BC028014	BC028014
>>D90042	D90042
>>I get out:
>>		[,1]		[,2]
>>32468_f_at "32468_f_at" "1084;63036"
>>32469_at   "32469_at"   "1084"      
>>38912_at   "38912_at"   "10"        
>>BF897514   "BF897514"   "1084"      
>>D90042     "D90042"     "10"        
>>Thanks a lot for your help in advance..
>>Benjamin Otto
>>Universitaetsklinikum Eppendorf Hamburg
>>Institut fuer Klinische Chemie
>>Martinistrasse 52
>>20246 Hamburg
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>Search the archives: 
>Jianhua Zhang
>Department of Medical Oncology
>Dana-Farber Cancer Institute
>44 Binney Street
>Boston, MA 02115-6084

Jianhua Zhang
Department of Medical Oncology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084

More information about the Bioconductor mailing list