[BioC] locate a target species in Refseq ftp directory

David Iles D.E.Iles at leeds.ac.uk
Fri Oct 4 20:59:34 CEST 2013


Hi Heyi,

You could try the following link to the sheep sequencing consortium web site. You'll find links to gff files there with known and predicted mRNAs, together with the latest draft assembly of the sheep genome sequence (together with thousands of unmapped scaffolds and contigs .....)

http://www.livestockgenomics.csiro.au/sheep/oar3.1.php

Hope the helps.

Dr David Iles
Visiting Fellow
School of Biology
University of Leeds
Leeds LS2 9JT
UK
d.e.iles at leeds.ac.uk<mailto:d.e.iles at leeds.ac.uk>





On 4 Oct 2013, at 17:12, heyi xiao <xiaoheyiyh at yahoo.com<mailto:xiaoheyiyh at yahoo.com>> wrote:

Thanks Jim, for the hint.
That’s even worse, I will have to download and work on all files now.
Heyi

--------------------------------------------
On Fri, 10/4/13, James W. MacDonald <jmacdon at uw.edu<mailto:jmacdon at uw.edu>> wrote:

Subject: Re: [BioC] locate a target species in Refseq ftp directory

Cc: bioconductor at r-project.org<mailto:bioconductor at r-project.org>
Date: Friday, October 4, 2013, 11:53 AM

Hi Heyi,

ftp://ftp.ncbi.nih.gov/refseq/release/release-notes/RefSeq-release61.txt

And NCBI says 'Ha ha on you - it's not by species!' For
example:

 zcat vertebrate_mammalian.1.1.genomic.fna.gz | grep \> |
head
gi|62867015|ref|NT_112066.2|NT_112066 Callithrix jacchus
genomic sequence, ENCODE region ENr231
gi|62871432|ref|NT_108597.2|NT_108597 Papio anubis
genomic sequence, ENCODE region ENm002
gi|62903504|ref|NT_086517.2|NT_086517 Callithrix jacchus
genomic sequence, ENCODE region ENm014
gi|62903506|ref|NT_113343.1|NT_113343 Dasypus
novemcinctus genomic sequence, ENCODE region ENr231
gi|62946791|ref|NT_113349.1|NT_113349 Papio anubis
genomic sequence, ENCODE region ENr323, part 2 of 2
gi|63025534|ref|NT_091694.3|NT_091694 Otolemur garnettii
genomic sequence, ENCODE region ENm010
gi|63145882|ref|NT_106990.3|NT_106990 Otolemur garnettii
genomic sequence, ENCODE region ENr322
gi|64724026|ref|NT_107822.2|NT_107822 Bos taurus genomic
sequence, ENCODE region ENm002
gi|64724078|ref|NT_107825.2|NT_107825 Bos taurus genomic
sequence, ENCODE region ENm003
gi|64724166|ref|NT_107827.2|NT_107827 Bos taurus genomic
sequence, ENCODE region ENm004


Best,

Jim



On Friday, October 04, 2013 11:29:23 AM, heyi xiao wrote:
Hi all,
I am trying to extract the RNA sequences for sheep (or
Ovis aries) in Refseq ftp site. The right directory should
be vertebrate_mammalian: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/vertebrate_mammalian/
But there so many *rna* files there, all named with
some numbers, like vertebrate_mammalian.154.rna.fna.gz, not
sure which one is for my target species. Readme files
don’t really help on this. does anyone knows how to locate
the right file for a target species there?
Heyi

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list