[Bioc-devel] SRAdb missing runs
Jack Zhu
zhujack at mail.nih.gov
Tue Oct 4 23:10:03 CEST 2011
Hi Malcolm,
Recently one other user also found missing SRA records in the SRAdb
database. I looked into the problem and it looks like the problems
was with the xml files on the NCBI SRA ftp
site. So I modified the package and switched the main downloading
source of the SRA xml files to EBI. It seems working now. Please let
me know if you still see any problems. Thanks.
Jack
On 19 September 2011 08:41, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> Hi, Malcolm. I submitted a ticket to SRA. They have assigned the
> ticket already. We'll keep you updated on the outcome as it
> definitely impacts the utilization of SRA by us (SRAdb) and others.
>
> Sean
>
>
> On Mon, Sep 19, 2011 at 8:25 AM, Cook, Malcolm <MEC at stowers.org> wrote:
>> Jack,
>>
>> Thanks for the reply.
>>
>> I'm actually not that savvy about the internals of SRA and GEO at NCBI. I've cobbled my first submission RNA-SEQ submission to GEO, which in turn submits to SRA. The reads in question are from modEnccode project which submits to GEO which submits to SRA. I've not tried to deconstruct the reason why some of these files have gone missing from the XML. Do you think this is something to report to modEncde, GEO, NCBI?
>>
>> Cheers,
>>
>> Malcolm
>>
>> ________________________________________
>> From: yuelin at gmail.com [yuelin at gmail.com] On Behalf Of Jack Zhu [zhujack at mail.nih.gov]
>> Sent: Friday, September 16, 2011 10:21 PM
>> To: Cook, Malcolm
>> Cc: bioc-devel at r-project.org; Sean Davis
>> Subject: Re: [Bioc-devel] SRAdb missing runs
>>
>> Hi Malcolm,
>>
>> I am really sorry that I missed your post, but thank you very much for
>> the report.
>>
>> I have reproduced the problem you found. I did a little bit study, it
>> looks like the problem of missing runs in the SRAdb is caused by
>> failure updating of the XML files by the NCBI.
>>
>> As you know all the data in the SRAdb is from NCBI SRA XML files,
>> which are downloaded from the NCBI ftp site
>> (ftp://ftp.ncbi.nih.gov/sra/Submissions/). As shown in this page,
>> http://www.ncbi.nlm.nih.gov/sra/SRX032508, SRR07443 was submitted
>> through SRA010243. Unfortunately the SRA010243 XML file on the NCBI
>> ftp site ( ftp://ftp.ncbi.nih.gov/sra/Submissions/SRA010/SRA010243/)
>> does not include SRR07443 and SRX032508, which is apparently a result
>> of failure updating of the XML files when new runs/samples were added.
>>
>> Malcolm, currently we are looking into new mechanisms to update SRAdb
>> and hopefully the problem will be fixed soon.
>>
>> Thanks again.
>>
>> Jack
>>
>>
>>
>> On 16 September 2011 07:06, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>>> Sorry, Malcolm.
>>>
>>> We'll look into it. Thanks for the report.
>>>
>>> Sean
>>>
>>>
>>> On Wed, Sep 14, 2011 at 5:09 PM, Cook, Malcolm <MEC at stowers.org> wrote:
>>>> Hi Sean, Jack, and fellow SRAdb users,
>>>>
>>>> Sean, I failed to cc: you 1st time around. Perhaps you have a suggestion for me....???
>>>>
>>>> I remain perplexed as to why selected SRA runs fail to appear in SRAdb.
>>>>
>>>> Does anyone else have some experience/advice in this.
>>>>
>>>> Thanks much,
>>>>
>>>> ~Malcolm
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Cook, Malcolm
>>>> Sent: Friday, September 09, 2011 4:15 PM
>>>> To: 'bioc-devel at r-project.org'; 'zhujack at mail.nih.gov'
>>>> Subject: SRAdb missing runs
>>>>
>>>> Hi Jack and other SRAdb users,
>>>>
>>>> I find at least one SRA run missing from the sqlite database obtained from a fresh `getSRAdbFile()`
>>>>
>>>> SRR074430 is present in the SRA http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=viewer&m=data&s=viewer&run=SRR074430
>>>>
>>>> but directly querying the sqlite3 database fails to find it:
>>>>
>>>> sqlite3 -list SRAmetadb.sqlite "select study_accession, submission_accession, sample_accession, experiment_accession, run_accession, sample_alias from sra where run_accession in ('SRR031766','SRR031767','SRR074430')"
>>>> SRP001537|SRA010243|SRS008471|SRX014483|SRR031766|S2_DRSC_CG10128_RNAi-1
>>>> SRP001537|SRA010243|SRS008471|SRX014483|SRR031767|S2_DRSC_CG10128_RNAi-1
>>>>
>>>> Can anyone advise me as the origin of this discrepancy, or perhaps fix a misunderstanding I may have in using this resource.
>>>>
>>>> I just downloaded a fresh SRAdbFile... here is the "Metadata associate with downloaded file:"
>>>>
>>>> c("schema version", "creation timestamp")c("1.0", "2011-09-03 10:38:16")
>>>>
>>>>
>>>> Below is a full transcript with SessionInfo(), if it helps.
>>>>
>>>> Thanks!
>>>>
>>>> Malcolm Cook
>>>> Computational Biology - Stowers Institute for Medical Research
>>>>
>>>>> library('SRAdb')
>>>>> sqlfile <- getSRAdbFile()
>>>> sqlfile <- getSRAdbFile()
>>>> trying URL 'http://gbnci.abcc.ncifcrf.gov/backup/SRAmetadb.sqlite.gz'
>>>> Content type 'text/plain; charset=ISO-8859-1' length 38391904 bytes (36.6 Mb)
>>>> opened URL
>>>> ==================================================
>>>> downloaded 36.6 Mb
>>>>
>>>> Unzipping...
>>>>
>>>> Metadata associate with downloaded file:
>>>>
>>>> c("schema version", "creation timestamp")c("1.0", "2011-09-03 10:38:16")
>>>>> sessionInfo()
>>>> sessionInfo()
>>>> R version 2.13.1 (2011-07-08)
>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>>
>>>> locale:
>>>> [1] C
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] SRAdb_1.6.0 RCurl_1.5-0 bitops_1.0-4.1 graph_1.30.0 RSQLite_0.9-4
>>>> [6] DBI_0.2-5
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] Biobase_2.12.2 GEOquery_2.19.2 XML_3.4-0 tools_2.13.1
>>>>> q('no')
>>>> bash-3.2$ sqlite3 -list SRAmetadb.sqlite "select study_accession, submission_accession, sample_accession, experiment_accession, run_accession, sample_alias from sra where run_accession in ('SRR031766','SRR031767','SRR074430')"
>>>> sqlite3 -list SRAmetadb.sqlite "select study_accession, submission_accession, sample_accession, experiment_accession, run_accession, sample_alias from sra where run_accession in ('SRR031766','SRR031767','SRR074430')"
>>>> SRP001537|SRA010243|SRS008471|SRX014483|SRR031766|S2_DRSC_CG10128_RNAi-1
>>>> SRP001537|SRA010243|SRS008471|SRX014483|SRR031767|S2_DRSC_CG10128_RNAi-1
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
More information about the Bioc-devel
mailing list