[Bioc-devel] SRAdb missing runs

Jack Zhu zhujack at mail.nih.gov
Sat Sep 17 05:21:24 CEST 2011


Hi Malcolm,

I am really sorry that I missed your post, but thank you very much for
the report.

I have reproduced the problem you found.  I did a little bit study, it
looks like the problem of missing runs in the SRAdb is caused by
failure updating of the XML files by the NCBI.

As you know all the data in the SRAdb is from NCBI SRA XML files,
which are downloaded from the NCBI ftp site
(ftp://ftp.ncbi.nih.gov/sra/Submissions/).  As shown in this page,
http://www.ncbi.nlm.nih.gov/sra/SRX032508, SRR07443 was submitted
through SRA010243. Unfortunately the SRA010243 XML file on the NCBI
ftp site ( ftp://ftp.ncbi.nih.gov/sra/Submissions/SRA010/SRA010243/)
does not include SRR07443 and SRX032508, which is apparently a result
of failure updating of the XML files when new runs/samples were added.

Malcolm, currently we are looking into new mechanisms to update SRAdb
and hopefully the problem will be fixed soon.

Thanks again.

Jack



On 16 September 2011 07:06, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> Sorry, Malcolm.
>
> We'll look into it.  Thanks for the report.
>
> Sean
>
>
> On Wed, Sep 14, 2011 at 5:09 PM, Cook, Malcolm <MEC at stowers.org> wrote:
>> Hi Sean, Jack, and fellow SRAdb users,
>>
>> Sean, I failed to cc: you 1st time around.  Perhaps you have a suggestion for me....???
>>
>> I remain perplexed as to why selected SRA runs fail to appear in SRAdb.
>>
>> Does anyone else have some experience/advice in this.
>>
>> Thanks much,
>>
>> ~Malcolm
>>
>>
>> -----Original Message-----
>> From: Cook, Malcolm
>> Sent: Friday, September 09, 2011 4:15 PM
>> To: 'bioc-devel at r-project.org'; 'zhujack at mail.nih.gov'
>> Subject: SRAdb missing runs
>>
>> Hi Jack and other SRAdb users,
>>
>> I find at least one SRA run missing from the sqlite database obtained from a fresh `getSRAdbFile()`
>>
>> SRR074430 is present in the SRA http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=viewer&m=data&s=viewer&run=SRR074430
>>
>> but directly querying the sqlite3 database fails to find it:
>>
>> sqlite3 -list SRAmetadb.sqlite "select study_accession, submission_accession, sample_accession, experiment_accession, run_accession,  sample_alias from sra  where run_accession in ('SRR031766','SRR031767','SRR074430')"
>> SRP001537|SRA010243|SRS008471|SRX014483|SRR031766|S2_DRSC_CG10128_RNAi-1
>> SRP001537|SRA010243|SRS008471|SRX014483|SRR031767|S2_DRSC_CG10128_RNAi-1
>>
>> Can anyone advise me as the origin of this discrepancy, or perhaps fix a misunderstanding I may have in using this resource.
>>
>> I just downloaded a fresh SRAdbFile...  here is the "Metadata associate with downloaded file:"
>>
>> c("schema version", "creation timestamp")c("1.0", "2011-09-03 10:38:16")
>>
>>
>> Below is a full transcript with SessionInfo(), if it helps.
>>
>> Thanks!
>>
>> Malcolm Cook
>> Computational Biology - Stowers Institute for Medical Research
>>
>>> library('SRAdb')
>>> sqlfile <- getSRAdbFile()
>> sqlfile <- getSRAdbFile()
>> trying URL 'http://gbnci.abcc.ncifcrf.gov/backup/SRAmetadb.sqlite.gz'
>> Content type 'text/plain; charset=ISO-8859-1' length 38391904 bytes (36.6 Mb)
>> opened URL
>> ==================================================
>> downloaded 36.6 Mb
>>
>> Unzipping...
>>
>> Metadata associate with downloaded file:
>>
>> c("schema version", "creation timestamp")c("1.0", "2011-09-03 10:38:16")
>>> sessionInfo()
>> sessionInfo()
>> R version 2.13.1 (2011-07-08)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] SRAdb_1.6.0    RCurl_1.5-0    bitops_1.0-4.1 graph_1.30.0   RSQLite_0.9-4
>> [6] DBI_0.2-5
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.12.2  GEOquery_2.19.2 XML_3.4-0       tools_2.13.1
>>> q('no')
>> bash-3.2$    sqlite3 -list SRAmetadb.sqlite "select study_accession, submission_accession, sample_accession, experiment_accession, run_accession,  sample_alias from sra  where run_accession in ('SRR031766','SRR031767','SRR074430')"
>>  sqlite3 -list SRAmetadb.sqlite "select study_accession, submission_accession, sample_accession, experiment_accession, run_accession,  sample_alias from sra  where run_accession in ('SRR031766','SRR031767','SRR074430')"
>> SRP001537|SRA010243|SRS008471|SRX014483|SRR031766|S2_DRSC_CG10128_RNAi-1
>> SRP001537|SRA010243|SRS008471|SRX014483|SRR031767|S2_DRSC_CG10128_RNAi-1
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



More information about the Bioc-devel mailing list