[BioC] AnnBuilders paseData() doesn't recognize ACCs with underscore?
jzhang at jimmy.harvard.edu
Thu Jan 18 17:58:14 CET 2007
>Your right, my problem is bound to the mix of accession and RefSeq Ids so
>being correct gbUGParser wouldn't be expectd to find the refseqs (my
>description of "accessions including underscores" was pretty dopey, I
>admitt). I just, probably in an attack of wild speculation, thought the "gb"
>scipts would automatically include the refseqs because there are no
>REF2xxxParsers and the gbNRef2LLParser is the only parser with refseq on the
>input side (as far as I can remember).The gbNRef2LLParser returns LocusLink
>Ids but I would like to match unigene ids and there seems to be no
>So probably I should rename a copy of the gbUGParser to "gbNREF2UGParser"
>and add the "_" to regular expression.
Yes, you can always write your own parsers to meet special requirements.
>Von: John Zhang [mailto:jzhang at jimmy.harvard.edu]
>Gesendet: 17 January 2007 15:12
>An: bioconductor at stat.math.ethz.ch; b.otto at uke.uni-hamburg.de
>Betreff: Re: [BioC] AnnBuilders paseData() doesn't recognize ACCs with
>>parseData() seems to have problems in recognition of accession numbers
>>including an underscore like "NM_001815". The function just doesn't
>>find them although they do exist in the database file.
>You have used a wrong parser. There are parsers, such as egRefseqParser and
>gbNRef2LLParser, that handles RefSeq ids with undersores. You need to pick
>one that fits your data.
>>Here is the example I'm trying to get working:
>>>pkgpath <- .find.package("AnnBuilder")
>>># unigene infos
>>>ugUrl <- "C:/Programme/R/R-2.4.1/library/AnnBuilder/data/Ths.data"
>>>ug <- UG(srcUrl = ugUrl, parser = file.path(pkgpath,
>>>"scripts", "gbUGParser"), baseFile = "geneNMap",
>>>organism = "Homo sapiens", built = "N/A", fromWeb = FALSE)
>>The geneNMap file has the entries:
>>I get out:
>> [,1] [,2]
>>32468_f_at "32468_f_at" "1084;63036"
>>32469_at "32469_at" "1084"
>>38912_at "38912_at" "10"
>>BF897514 "BF897514" "1084"
>>D90042 "D90042" "10"
>>Thanks a lot for your help in advance..
>>Universitaetsklinikum Eppendorf Hamburg
>>Institut fuer Klinische Chemie
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>Search the archives:
>Department of Medical Oncology
>Dana-Farber Cancer Institute
>44 Binney Street
>Boston, MA 02115-6084
Department of Medical Oncology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084
More information about the Bioconductor