[BioC] Bioconductor Digest, Vol 95, Issue 7

Jack Zhu zhujack at mail.nih.gov
Fri Jan 7 21:27:49 CET 2011


Hi Mark and Sean,

As Sean mentioned, the NCBI SRA group removed fastq data files from
their ftp site, but supplies sra or sra-lite data files for
downloading.  In order to  deal with this significant changes, I have
modified the SRA package (in both 2.7 release and dev version):

1. Removed functions of listFastq, getFastqInfo and getFaastq
2. Added functions of listSRAfile, getSRAinfo and getSRAfile
3. Modified the corresponding files to reflect the change.

Examples of new functions:

	library(SRAdb)
        getSRAdbFile()
	sra_dbname <- 'SRAmetadb.sqlite'	
	sra_con <- dbConnect(dbDriver("SQLite"), sra_dbname)

List sra-lite data file names including ftp addresses associated with
"SRX000122":

> rs <- listSRAfile("SRX000122", sra_con = sra_con, sraType = "litesra")
> rs[1:2,]

 experiment
                                                        sra
1  SRX000122 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX000/SRX000122/SRR000648/SRR000648.lite.sra
2  SRX000122 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX000/SRX000122/SRR000649/SRR000649.lite.sra

The above function does not check file availability, size and date of
the sra or sra-lite data files on the server, but the function
getSRAinfo does this, which is good to know if you
are preparing to download them:

> rs <- getSRAinfo(in_acc = c("SRX000122"),  sra_con = sra_con)
> rs[1:2, ]

                                              sra experiment size(KB)
1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX000/SRX000122/SRR000648/SRR000648.lite.sra
 SRX000122      104
2 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX000/SRX000122/SRR000649/SRR000649.lite.sra
 SRX000122    50536


Next you might want to download sra or sra-lite data files from the
ftp site. The getSRAfile function will download all available sra or
sra-lite data files associated with
"SRR000648" and "SRR000657" from NCBI SRA ftp site to a new folder in
current directory:

> getSRAfile(in_acc = c("SRR000648", "SRR000657"), sra_con = sra_con, destdir = getwd(), sraType = "litesra", method='curl')

Files are saved to: '/Users/zhujack/Documents/R'

100  103k  100  103k    0     0  57382      0  0:00:01  0:00:01
--:--:--  124k:--:--     0
100  154k  100  154k    0     0   132k      0  0:00:01  0:00:01
--:--:--  217k-:--     0

Your suggestions will be greatly appreciated.

Jack



> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 5 Jan 2011 11:29:10 +0000
> From: Mark Dunning <mark.dunning at gmail.com>
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] SRAdb listFastq error
> Message-ID:
>        <AANLkTikvwCOfvccvgFt4M_fY5WTU4GDP8te1au9VycLc at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi,
>
> I am hoping to download some Fastq files from the Short Read Archive
> and am following the vignette for SRAdb. However, I get an error when
> trying the example of using listFastq
>
>>  listFastq("SRA011804", sra_con = sra_con)
> Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
>  Server denied you to change to the given directory
>
> I think I have setup the sra_con object correctly
>
>> sra_con
> <SQLiteConnection: DBI CON (2626, 2)>
>>  dbListFields(sra_con, "study")
>  [1] "study_ID"             "study_alias"          "study_accession"
>  [4] "study_title"          "study_type"           "study_abstract"
>  [7] "center_name"          "center_project_name"  "project_id"
> [10] "study_description"    "study_url_link"       "study_entrez_link"
> [13] "study_attribute"      "submission_accession" "sradb_updated"
>>
>
> Cheers,
>
> Mark
>
>
>> sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C
>  [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8
>  [5] LC_MONETARY=C             LC_MESSAGES=en_GB.utf8
>  [7] LC_PAPER=en_GB.utf8       LC_NAME=C
>  [9] LC_ADDRESS=C              LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] SRAdb_1.4.0   graph_1.28.0  RSQLite_0.9-3 DBI_0.2-5
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.10.0  GEOquery_2.16.3 RCurl_1.4-3     tools_2.12.0
> [5] XML_3.2-0
>
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 5 Jan 2011 06:54:33 -0500
> From: Sean Davis <sdavis2 at mail.nih.gov>
> To: Mark Dunning <mark.dunning at gmail.com>
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] SRAdb listFastq error
> Message-ID:
>        <AANLkTintg+cgH35_5ZUTbKqwNaoBVwmLP-absQ0b6Cti at mail.gmail.com>
> Content-Type: text/plain
>
> On Wed, Jan 5, 2011 at 6:29 AM, Mark Dunning <mark.dunning at gmail.com> wrote:
>
>> Hi,
>>
>> I am hoping to download some Fastq files from the Short Read Archive
>> and am following the vignette for SRAdb. However, I get an error when
>> trying the example of using listFastq
>>
>> >  listFastq("SRA011804", sra_con = sra_con)
>> Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
>>  Server denied you to change to the given directory
>>
>>
> Hi, Mark.  Unfortunately, we are going to have to remove fastq access tools
> as NCBI has removed the fastq files:
>
> http://www.ncbi.nlm.nih.gov/books/NBK49286/#SRA_Usability_Chang.2_Static_fastq_dumps
>
> We are looking at workarounds for this, but for the time being, the
> functionality is broken and will not likely be retained in the same form.
>
> Sean
>
>
>> I think I have setup the sra_con object correctly
>>
>> > sra_con
>> <SQLiteConnection: DBI CON (2626, 2)>
>> >  dbListFields(sra_con, "study")
>>  [1] "study_ID"             "study_alias"          "study_accession"
>>  [4] "study_title"          "study_type"           "study_abstract"
>>  [7] "center_name"          "center_project_name"  "project_id"
>> [10] "study_description"    "study_url_link"       "study_entrez_link"
>> [13] "study_attribute"      "submission_accession" "sradb_updated"
>> >
>>
>> Cheers,
>>
>> Mark
>>
>>
>> > sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C
>>  [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8
>>  [5] LC_MONETARY=C             LC_MESSAGES=en_GB.utf8
>>  [7] LC_PAPER=en_GB.utf8       LC_NAME=C
>>  [9] LC_ADDRESS=C              LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] SRAdb_1.4.0   graph_1.28.0  RSQLite_0.9-3 DBI_0.2-5
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.10.0  GEOquery_2.16.3 RCurl_1.4-3     tools_2.12.0
>> [5] XML_3.2-0
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>        [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 5 Jan 2011 08:13:28 -0500
> From: Sean Davis <sdavis2 at mail.nih.gov>
> To: Mark Dunning <mark.dunning at gmail.com>
> Cc: Bioconductor Newsgroup <bioconductor at stat.math.ethz.ch>
> Subject: Re: [BioC] SRAdb listFastq error
> Message-ID:
>        <AANLkTin7LzRfRnDLTPEc4C-zVbnKhazcAuGQtphrG+s0 at mail.gmail.com>
> Content-Type: text/plain
>
> On Wed, Jan 5, 2011 at 7:47 AM, Mark Dunning <mark.dunning at gmail.com> wrote:
>
>> Hi Sean,
>>
>> That's a shame. Thanks for the information. I think I can get the
>> fastqs I need by another means though.
>>
>>
> You don't have to work too hard at it.  The process is described here:
>
> http://www.ncbi.nlm.nih.gov/books/NBK50846/#UsingToolKit_BK.3_Converting_SRA_format
>
> In short, you'll need the SRA SDK to do so.
>
> http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software
>
> Binaries are available for several architectures and OSes.
>
> Sean
>
>
>
>> Regards,
>>
>> Mark
>>
>> On Wed, Jan 5, 2011 at 11:54 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> >
>> >
>> > On Wed, Jan 5, 2011 at 6:29 AM, Mark Dunning <mark.dunning at gmail.com>
>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I am hoping to download some Fastq files from the Short Read Archive
>> >> and am following the vignette for SRAdb. However, I get an error when
>> >> trying the example of using listFastq
>> >>
>> >> >  listFastq("SRA011804", sra_con = sra_con)
>> >> Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
>> >>  Server denied you to change to the given directory
>> >>
>> >
>> > Hi, Mark.  Unfortunately, we are going to have to remove fastq access
>> tools
>> > as NCBI has removed the fastq files:
>> >
>> http://www.ncbi.nlm.nih.gov/books/NBK49286/#SRA_Usability_Chang.2_Static_fastq_dumps
>> > We are looking at workarounds for this, but for the time being, the
>> > functionality is broken and will not likely be retained in the same form.
>> > Sean
>> >
>> >>
>> >> I think I have setup the sra_con object correctly
>> >>
>> >> > sra_con
>> >> <SQLiteConnection: DBI CON (2626, 2)>
>> >> >  dbListFields(sra_con, "study")
>> >>  [1] "study_ID"             "study_alias"          "study_accession"
>> >>  [4] "study_title"          "study_type"           "study_abstract"
>> >>  [7] "center_name"          "center_project_name"  "project_id"
>> >> [10] "study_description"    "study_url_link"       "study_entrez_link"
>> >> [13] "study_attribute"      "submission_accession" "sradb_updated"
>> >> >
>> >>
>> >> Cheers,
>> >>
>> >> Mark
>> >>
>> >>
>> >> > sessionInfo()
>> >> R version 2.12.0 (2010-10-15)
>> >> Platform: x86_64-pc-linux-gnu (64-bit)
>> >>
>> >> locale:
>> >>  [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C
>> >>  [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8
>> >>  [5] LC_MONETARY=C             LC_MESSAGES=en_GB.utf8
>> >>  [7] LC_PAPER=en_GB.utf8       LC_NAME=C
>> >>  [9] LC_ADDRESS=C              LC_TELEPHONE=C
>> >> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>> >>
>> >> attached base packages:
>> >> [1] stats     graphics  grDevices utils     datasets  methods   base
>> >>
>> >> other attached packages:
>> >> [1] SRAdb_1.4.0   graph_1.28.0  RSQLite_0.9-3 DBI_0.2-5
>> >>
>> >> loaded via a namespace (and not attached):
>> >> [1] Biobase_2.10.0  GEOquery_2.16.3 RCurl_1.4-3     tools_2.12.0
>> >> [5] XML_3.2-0
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor at r-project.org
>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> Search the archives:
>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>>
>
>        [[alternative HTML version deleted]]
>
>
>



More information about the Bioconductor mailing list