[BioC] Genbank to Unigene IDs
John Zhang
jzhang at jimmy.harvard.edu
Fri Apr 16 15:33:11 CEST 2004
Sorry, the example code should be
> ids <- c("AC010642", "AF414429", "X56654", "Y08432")
> ids2ll <-
as.matrix(read.table("ftp://ftp.ncbi.nih.gov/refseq/LocusLink/loc2acc", header =
FALSE, sep = "\t"))
# We only need the first and second column
> ids2ll <- ids2ll[, c(1, 2)]
> colnames(ids2ll) <- c("LL", "GB")
># Drop the version number
> ids2ll[,2] <- gsub("\\..*", "", ids2ll[,2])
> mapped <- ids2ll[is.element(ids2ll[,2], ids),]
> mapped
LL GB
1 " 1" "AC010642"
4 " 1" "AF414429"
10671 " 1828" "X56654"
10677 " 1830" "Y08432"
>I think the most direct way of getting the ids maped is to use sources
available
>at LocusLink(ftp://ftp.ncbi.nih.gov/refseq/LocusLink). If your target file
>contains GenBank accession numbers (e. g. "AC010642", "AC010642", ...), read
>ftp://ftp.ncbi.nih.gov/refseq/LocusLink/loc2acc using read.table (sep = "\t")
>and then do a matching. If your target file contains RefSeq ids (e. g.
>"NM_130786", "NM_000014", ...), read
>ftp://ftp.ncbi.nih.gov/refseq/LocusLink/loc2ref, instead. An example:
>
>> ids <- c("AC010642", "AF414429", "X56654", "Y08432")
>> ids2ll <-
>as.matrix(read.table("ftp://ftp.ncbi.nih.gov/refseq/LocusLink/loc2acc", header
=
>FALSE, sep = "\t", strip.white = TRUE))
># We only need the second and third column
>> ids2ll <- ids2ll[, c(2, 3)]
>> colnames(ids2ll) <- c("GB", "LL")
># Drop the version number
>> ids2ll[,1] <- gsub("\\..*", "", ids2ll[,1])
>> mapped <- ids2ll[is.element(ids2ll[,1], ids),]
>> mapped
> GB LL
>1 "AC010642" "-"
>4 "AF414429" "15778556"
>10671 "X56654" "30506"
>10677 "Y08432" "-"
>
>
>
>>
>>Thanks a lot
>>Gordon
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
>Jianhua Zhang
>Department of Biostatistics
>Dana-Farber Cancer Institute
>44 Binney Street
>Boston, MA 02115-6084
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
Jianhua Zhang
Department of Biostatistics
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084
More information about the Bioconductor
mailing list