[Bioc-devel] Bioconductor Digest, Vol 95, Issue 7
Jack Zhu
zhujack at mail.nih.gov
Fri Jan 7 21:27:49 CET 2011
Hi Mark and Sean,
As Sean mentioned, the NCBI SRA group removed fastq data files from
their ftp site, but supplies sra or sra-lite data files for
downloading. In order to deal with this significant changes, I have
modified the SRA package (in both 2.7 release and dev version):
1. Removed functions of listFastq, getFastqInfo and getFaastq
2. Added functions of listSRAfile, getSRAinfo and getSRAfile
3. Modified the corresponding files to reflect the change.
Examples of new functions:
library(SRAdb)
getSRAdbFile()
sra_dbname <- 'SRAmetadb.sqlite'
sra_con <- dbConnect(dbDriver("SQLite"), sra_dbname)
List sra-lite data file names including ftp addresses associated with
"SRX000122":
> rs <- listSRAfile("SRX000122", sra_con = sra_con, sraType = "litesra")
> rs[1:2,]
experiment
sra
1 SRX000122 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX000/SRX000122/SRR000648/SRR000648.lite.sra
2 SRX000122 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX000/SRX000122/SRR000649/SRR000649.lite.sra
The above function does not check file availability, size and date of
the sra or sra-lite data files on the server, but the function
getSRAinfo does this, which is good to know if you
are preparing to download them:
> rs <- getSRAinfo(in_acc = c("SRX000122"), sra_con = sra_con)
> rs[1:2, ]
sra experiment size(KB)
1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX000/SRX000122/SRR000648/SRR000648.lite.sra
SRX000122 104
2 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX000/SRX000122/SRR000649/SRR000649.lite.sra
SRX000122 50536
Next you might want to download sra or sra-lite data files from the
ftp site. The getSRAfile function will download all available sra or
sra-lite data files associated with
"SRR000648" and "SRR000657" from NCBI SRA ftp site to a new folder in
current directory:
> getSRAfile(in_acc = c("SRR000648", "SRR000657"), sra_con = sra_con, destdir = getwd(), sraType = "litesra", method='curl')
Files are saved to: '/Users/zhujack/Documents/R'
100 103k 100 103k 0 0 57382 0 0:00:01 0:00:01
--:--:-- 124k:--:-- 0
100 154k 100 154k 0 0 132k 0 0:00:01 0:00:01
--:--:-- 217k-:-- 0
Your suggestions will be greatly appreciated.
Jack
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 5 Jan 2011 11:29:10 +0000
> From: Mark Dunning <mark.dunning at gmail.com>
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] SRAdb listFastq error
> Message-ID:
> <AANLkTikvwCOfvccvgFt4M_fY5WTU4GDP8te1au9VycLc at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi,
>
> I am hoping to download some Fastq files from the Short Read Archive
> and am following the vignette for SRAdb. However, I get an error when
> trying the example of using listFastq
>
>> listFastq("SRA011804", sra_con = sra_con)
> Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
> Server denied you to change to the given directory
>
> I think I have setup the sra_con object correctly
>
>> sra_con
> <SQLiteConnection: DBI CON (2626, 2)>
>> dbListFields(sra_con, "study")
> [1] "study_ID" "study_alias" "study_accession"
> [4] "study_title" "study_type" "study_abstract"
> [7] "center_name" "center_project_name" "project_id"
> [10] "study_description" "study_url_link" "study_entrez_link"
> [13] "study_attribute" "submission_accession" "sradb_updated"
>>
>
> Cheers,
>
> Mark
>
>
>> sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
> [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
> [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8
> [7] LC_PAPER=en_GB.utf8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] SRAdb_1.4.0 graph_1.28.0 RSQLite_0.9-3 DBI_0.2-5
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.10.0 GEOquery_2.16.3 RCurl_1.4-3 tools_2.12.0
> [5] XML_3.2-0
>
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 5 Jan 2011 06:54:33 -0500
> From: Sean Davis <sdavis2 at mail.nih.gov>
> To: Mark Dunning <mark.dunning at gmail.com>
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] SRAdb listFastq error
> Message-ID:
> <AANLkTintg+cgH35_5ZUTbKqwNaoBVwmLP-absQ0b6Cti at mail.gmail.com>
> Content-Type: text/plain
>
> On Wed, Jan 5, 2011 at 6:29 AM, Mark Dunning <mark.dunning at gmail.com> wrote:
>
>> Hi,
>>
>> I am hoping to download some Fastq files from the Short Read Archive
>> and am following the vignette for SRAdb. However, I get an error when
>> trying the example of using listFastq
>>
>> > listFastq("SRA011804", sra_con = sra_con)
>> Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
>> Server denied you to change to the given directory
>>
>>
> Hi, Mark. Unfortunately, we are going to have to remove fastq access tools
> as NCBI has removed the fastq files:
>
> http://www.ncbi.nlm.nih.gov/books/NBK49286/#SRA_Usability_Chang.2_Static_fastq_dumps
>
> We are looking at workarounds for this, but for the time being, the
> functionality is broken and will not likely be retained in the same form.
>
> Sean
>
>
>> I think I have setup the sra_con object correctly
>>
>> > sra_con
>> <SQLiteConnection: DBI CON (2626, 2)>
>> > dbListFields(sra_con, "study")
>> [1] "study_ID" "study_alias" "study_accession"
>> [4] "study_title" "study_type" "study_abstract"
>> [7] "center_name" "center_project_name" "project_id"
>> [10] "study_description" "study_url_link" "study_entrez_link"
>> [13] "study_attribute" "submission_accession" "sradb_updated"
>> >
>>
>> Cheers,
>>
>> Mark
>>
>>
>> > sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
>> [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
>> [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8
>> [7] LC_PAPER=en_GB.utf8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] SRAdb_1.4.0 graph_1.28.0 RSQLite_0.9-3 DBI_0.2-5
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.10.0 GEOquery_2.16.3 RCurl_1.4-3 tools_2.12.0
>> [5] XML_3.2-0
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 5 Jan 2011 08:13:28 -0500
> From: Sean Davis <sdavis2 at mail.nih.gov>
> To: Mark Dunning <mark.dunning at gmail.com>
> Cc: Bioconductor Newsgroup <bioconductor at stat.math.ethz.ch>
> Subject: Re: [BioC] SRAdb listFastq error
> Message-ID:
> <AANLkTin7LzRfRnDLTPEc4C-zVbnKhazcAuGQtphrG+s0 at mail.gmail.com>
> Content-Type: text/plain
>
> On Wed, Jan 5, 2011 at 7:47 AM, Mark Dunning <mark.dunning at gmail.com> wrote:
>
>> Hi Sean,
>>
>> That's a shame. Thanks for the information. I think I can get the
>> fastqs I need by another means though.
>>
>>
> You don't have to work too hard at it. The process is described here:
>
> http://www.ncbi.nlm.nih.gov/books/NBK50846/#UsingToolKit_BK.3_Converting_SRA_format
>
> In short, you'll need the SRA SDK to do so.
>
> http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software
>
> Binaries are available for several architectures and OSes.
>
> Sean
>
>
>
>> Regards,
>>
>> Mark
>>
>> On Wed, Jan 5, 2011 at 11:54 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> >
>> >
>> > On Wed, Jan 5, 2011 at 6:29 AM, Mark Dunning <mark.dunning at gmail.com>
>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I am hoping to download some Fastq files from the Short Read Archive
>> >> and am following the vignette for SRAdb. However, I get an error when
>> >> trying the example of using listFastq
>> >>
>> >> > listFastq("SRA011804", sra_con = sra_con)
>> >> Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
>> >> Server denied you to change to the given directory
>> >>
>> >
>> > Hi, Mark. Unfortunately, we are going to have to remove fastq access
>> tools
>> > as NCBI has removed the fastq files:
>> >
>> http://www.ncbi.nlm.nih.gov/books/NBK49286/#SRA_Usability_Chang.2_Static_fastq_dumps
>> > We are looking at workarounds for this, but for the time being, the
>> > functionality is broken and will not likely be retained in the same form.
>> > Sean
>> >
>> >>
>> >> I think I have setup the sra_con object correctly
>> >>
>> >> > sra_con
>> >> <SQLiteConnection: DBI CON (2626, 2)>
>> >> > dbListFields(sra_con, "study")
>> >> [1] "study_ID" "study_alias" "study_accession"
>> >> [4] "study_title" "study_type" "study_abstract"
>> >> [7] "center_name" "center_project_name" "project_id"
>> >> [10] "study_description" "study_url_link" "study_entrez_link"
>> >> [13] "study_attribute" "submission_accession" "sradb_updated"
>> >> >
>> >>
>> >> Cheers,
>> >>
>> >> Mark
>> >>
>> >>
>> >> > sessionInfo()
>> >> R version 2.12.0 (2010-10-15)
>> >> Platform: x86_64-pc-linux-gnu (64-bit)
>> >>
>> >> locale:
>> >> [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
>> >> [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
>> >> [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8
>> >> [7] LC_PAPER=en_GB.utf8 LC_NAME=C
>> >> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> >> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>> >>
>> >> attached base packages:
>> >> [1] stats graphics grDevices utils datasets methods base
>> >>
>> >> other attached packages:
>> >> [1] SRAdb_1.4.0 graph_1.28.0 RSQLite_0.9-3 DBI_0.2-5
>> >>
>> >> loaded via a namespace (and not attached):
>> >> [1] Biobase_2.10.0 GEOquery_2.16.3 RCurl_1.4-3 tools_2.12.0
>> >> [5] XML_3.2-0
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor at r-project.org
>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> Search the archives:
>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>>
>
> [[alternative HTML version deleted]]
>
>
>
More information about the Bioc-devel
mailing list