[BioC] locate a target species in Refseq ftp directory

James W. MacDonald jmacdon at uw.edu
Fri Oct 4 17:53:44 CEST 2013


Hi Heyi,

ftp://ftp.ncbi.nih.gov/refseq/release/release-notes/RefSeq-release61.txt

And NCBI says 'Ha ha on you - it's not by species!' For example:

 zcat vertebrate_mammalian.1.1.genomic.fna.gz | grep \> | head
>gi|62867015|ref|NT_112066.2|NT_112066 Callithrix jacchus genomic sequence, ENCODE region ENr231
>gi|62871432|ref|NT_108597.2|NT_108597 Papio anubis genomic sequence, ENCODE region ENm002
>gi|62903504|ref|NT_086517.2|NT_086517 Callithrix jacchus genomic sequence, ENCODE region ENm014
>gi|62903506|ref|NT_113343.1|NT_113343 Dasypus novemcinctus genomic sequence, ENCODE region ENr231
>gi|62946791|ref|NT_113349.1|NT_113349 Papio anubis genomic sequence, ENCODE region ENr323, part 2 of 2
>gi|63025534|ref|NT_091694.3|NT_091694 Otolemur garnettii genomic sequence, ENCODE region ENm010
>gi|63145882|ref|NT_106990.3|NT_106990 Otolemur garnettii genomic sequence, ENCODE region ENr322
>gi|64724026|ref|NT_107822.2|NT_107822 Bos taurus genomic sequence, ENCODE region ENm002
>gi|64724078|ref|NT_107825.2|NT_107825 Bos taurus genomic sequence, ENCODE region ENm003
>gi|64724166|ref|NT_107827.2|NT_107827 Bos taurus genomic sequence, ENCODE region ENm004


Best,

Jim



On Friday, October 04, 2013 11:29:23 AM, heyi xiao wrote:
> Hi all,
> I am trying to extract the RNA sequences for sheep (or Ovis aries) in Refseq ftp site. The right directory should be vertebrate_mammalian: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/vertebrate_mammalian/
> But there so many *rna* files there, all named with some numbers, like vertebrate_mammalian.154.rna.fna.gz, not sure which one is for my target species. Readme files don’t really help on this. does anyone knows how to locate the right file for a target species there?
> Heyi
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list