[BioC] AnnBuilders paseData() doesn't recognize ACCs with underscore?
Benjamin Otto
b.otto at uke.uni-hamburg.de
Thu Jan 18 15:27:16 CET 2007
Hi John,
Your right, my problem is bound to the mix of accession and RefSeq Ids so
being correct gbUGParser wouldn't be expectd to find the refseqs (my
description of "accessions including underscores" was pretty dopey, I
admitt). I just, probably in an attack of wild speculation, thought the "gb"
scipts would automatically include the refseqs because there are no
REF2xxxParsers and the gbNRef2LLParser is the only parser with refseq on the
input side (as far as I can remember).The gbNRef2LLParser returns LocusLink
Ids but I would like to match unigene ids and there seems to be no
"gbNREF2UGParser"...
So probably I should rename a copy of the gbUGParser to "gbNREF2UGParser"
and add the "_" to regular expression.
Regards,
Benjamin
-----Ursprüngliche Nachricht-----
Von: John Zhang [mailto:jzhang at jimmy.harvard.edu]
Gesendet: 17 January 2007 15:12
An: bioconductor at stat.math.ethz.ch; b.otto at uke.uni-hamburg.de
Betreff: Re: [BioC] AnnBuilders paseData() doesn't recognize ACCs with
underscore?
>
>parseData() seems to have problems in recognition of accession numbers
>including an underscore like "NM_001815". The function just doesn't
>find them although they do exist in the database file.
You have used a wrong parser. There are parsers, such as egRefseqParser and
gbNRef2LLParser, that handles RefSeq ids with undersores. You need to pick
one that fits your data.
>
>Here is the example I'm trying to get working:
>
>>library(AnnBuilder)
>>pkgpath <- .find.package("AnnBuilder")
>># unigene infos
>>ugUrl <- "C:/Programme/R/R-2.4.1/library/AnnBuilder/data/Ths.data"
>># parsing
>>ug <- UG(srcUrl = ugUrl, parser = file.path(pkgpath,
>>"scripts", "gbUGParser"), baseFile = "geneNMap",
>>organism = "Homo sapiens", built = "N/A", fromWeb = FALSE)
>>parseData(ug)
>
>The geneNMap file has the entries:
>
>32468_f_at D90278;M16652
>32469_at L00693
>NM_001815 NM_001815
>BF897514 BF897514
>38912_at D90042
>BC028014 BC028014
>D90042 D90042
>
>I get out:
> [,1] [,2]
>32468_f_at "32468_f_at" "1084;63036"
>32469_at "32469_at" "1084"
>38912_at "38912_at" "10"
>BF897514 "BF897514" "1084"
>D90042 "D90042" "10"
>
>
>Thanks a lot for your help in advance..
>
>Regards,
>
>Benjamin
>
>
>--
>Benjamin Otto
>Universitaetsklinikum Eppendorf Hamburg
>Institut fuer Klinische Chemie
>Martinistrasse 52
>20246 Hamburg
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Jianhua Zhang
Department of Medical Oncology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084
More information about the Bioconductor
mailing list