[Bioc-devel] SRAdb missing runs

Cook, Malcolm MEC at stowers.org
Tue Oct 11 21:37:32 CEST 2011


Hi Sean,

Hmm.  I thought I _had_ already updated the database, but, Lo, trying again, and, guess what, it does now return 1 row.

Hooray, excellent, bravo, and thanks for your sleuthing.

~Malcolm

> -----Original Message-----
> From: seandavi at gmail.com [mailto:seandavi at gmail.com] On Behalf Of Sean
> Davis
> Sent: Saturday, October 08, 2011 6:09 PM
> To: Cook, Malcolm
> Cc: Jack Zhu; bioc-devel at r-project.org
> Subject: Re: [Bioc-devel] SRAdb missing runs
> 
> On Fri, Oct 7, 2011 at 11:28 PM, Cook, Malcolm <MEC at stowers.org> wrote:
> > Jack & Sean,
> >
> > I just checked and found that the latest version of SRAdb released is
> SRAdb_1.6.0   for R version 2.13.1.
> 
> Hi, Malcolm.
> 
> You probably will not need to update SRAdb package immediately.  In
> order to address the questions you have below, it should suffice to
> update the database using getSRAdbFile().
> 
> > Is there anything I can do to avail myself of you changes short of running
> with development R/BioC (or putting rewrite rules in my proxy ;)?
> >
> > Has NCBI acknowledged the issue you reported as being on their side?
> 
> They looked a few days later and did not find the problem.  Upon
> updating our database locally, the problem appeared to be fixed.
> 
> > I am faced again with this problem, on a different SRA study (this being the
> 2nd time I've wanted to use SRAdb).
> >
> > Would you be able to confirm for me that using the XML from EBI fixes the
> issue for the following study? (Of course, I understand if not)
> >
> > I find that no rows are returned by
> >        sqliteQuickSQL(sra_con,'select * from study where study_accession =
> "SRP004442"')
> >
> 
> Using the SRAdb database file downloaded today and built on
> 2011-10-04, this query returns 1 row.
> 
> Thanks for your patience, Malcolm.
> 
> Sean
> 
> >
> >> -----Original Message-----
> >> From: yuelin at gmail.com [mailto:yuelin at gmail.com] On Behalf Of Jack
> Zhu
> >> Sent: Tuesday, October 04, 2011 4:10 PM
> >> To: Sean Davis
> >> Cc: Cook, Malcolm; bioc-devel at r-project.org
> >> Subject: Re: [Bioc-devel] SRAdb missing runs
> >>
> >> Hi Malcolm,
> >>
> >> Recently one other user also found missing SRA records in the SRAdb
> >> database.  I looked into the problem and  it looks like the problems
> >> was with the xml files on the NCBI SRA ftp
> >> site. So I modified the package and switched the main downloading
> >> source of the SRA xml files to EBI.  It seems working now.  Please let
> >> me know if you still see any problems.  Thanks.
> >>
> >> Jack
> >>
> >>
> >> On 19 September 2011 08:41, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> >> > Hi, Malcolm.  I submitted a ticket to SRA.  They have assigned the
> >> > ticket already.  We'll keep you updated on the outcome as it
> >> > definitely impacts the utilization of SRA by us (SRAdb) and others.
> >> >
> >> > Sean
> >> >
> >> >
> >> > On Mon, Sep 19, 2011 at 8:25 AM, Cook, Malcolm <MEC at stowers.org>
> >> wrote:
> >> >> Jack,
> >> >>
> >> >> Thanks for the reply.
> >> >>
> >> >> I'm actually not that savvy about the internals of SRA and GEO at
> >> NCBI.  I've cobbled my first submission RNA-SEQ submission to GEO,
> which in
> >> turn submits to SRA.  The reads in question are from modEnccode project
> >> which submits to GEO which submits to SRA.  I've not tried to deconstruct
> the
> >> reason why some of these files have gone missing from the XML.   Do you
> >> think this is something to report to modEncde, GEO, NCBI?
> >> >>
> >> >> Cheers,
> >> >>
> >> >> Malcolm
> >> >>
> >> >> ________________________________________
> >> >> From: yuelin at gmail.com [yuelin at gmail.com] On Behalf Of Jack Zhu
> >> [zhujack at mail.nih.gov]
> >> >> Sent: Friday, September 16, 2011 10:21 PM
> >> >> To: Cook, Malcolm
> >> >> Cc: bioc-devel at r-project.org; Sean Davis
> >> >> Subject: Re: [Bioc-devel] SRAdb missing runs
> >> >>
> >> >> Hi Malcolm,
> >> >>
> >> >> I am really sorry that I missed your post, but thank you very much for
> >> >> the report.
> >> >>
> >> >> I have reproduced the problem you found.  I did a little bit study, it
> >> >> looks like the problem of missing runs in the SRAdb is caused by
> >> >> failure updating of the XML files by the NCBI.
> >> >>
> >> >> As you know all the data in the SRAdb is from NCBI SRA XML files,
> >> >> which are downloaded from the NCBI ftp site
> >> >> (ftp://ftp.ncbi.nih.gov/sra/Submissions/).  As shown in this page,
> >> >> http://www.ncbi.nlm.nih.gov/sra/SRX032508, SRR07443 was
> submitted
> >> >> through SRA010243. Unfortunately the SRA010243 XML file on the
> NCBI
> >> >> ftp site ( ftp://ftp.ncbi.nih.gov/sra/Submissions/SRA010/SRA010243/)
> >> >> does not include SRR07443 and SRX032508, which is apparently a result
> >> >> of failure updating of the XML files when new runs/samples were
> added.
> >> >>
> >> >> Malcolm, currently we are looking into new mechanisms to update
> SRAdb
> >> >> and hopefully the problem will be fixed soon.
> >> >>
> >> >> Thanks again.
> >> >>
> >> >> Jack
> >> >>
> >> >>
> >> >>
> >> >> On 16 September 2011 07:06, Sean Davis <sdavis2 at mail.nih.gov>
> wrote:
> >> >>> Sorry, Malcolm.
> >> >>>
> >> >>> We'll look into it.  Thanks for the report.
> >> >>>
> >> >>> Sean
> >> >>>
> >> >>>
> >> >>> On Wed, Sep 14, 2011 at 5:09 PM, Cook, Malcolm
> <MEC at stowers.org>
> >> wrote:
> >> >>>> Hi Sean, Jack, and fellow SRAdb users,
> >> >>>>
> >> >>>> Sean, I failed to cc: you 1st time around.  Perhaps you have a
> >> suggestion for me....???
> >> >>>>
> >> >>>> I remain perplexed as to why selected SRA runs fail to appear in
> SRAdb.
> >> >>>>
> >> >>>> Does anyone else have some experience/advice in this.
> >> >>>>
> >> >>>> Thanks much,
> >> >>>>
> >> >>>> ~Malcolm
> >> >>>>
> >> >>>>
> >> >>>> -----Original Message-----
> >> >>>> From: Cook, Malcolm
> >> >>>> Sent: Friday, September 09, 2011 4:15 PM
> >> >>>> To: 'bioc-devel at r-project.org'; 'zhujack at mail.nih.gov'
> >> >>>> Subject: SRAdb missing runs
> >> >>>>
> >> >>>> Hi Jack and other SRAdb users,
> >> >>>>
> >> >>>> I find at least one SRA run missing from the sqlite database obtained
> >> from a fresh `getSRAdbFile()`
> >> >>>>
> >> >>>> SRR074430 is present in the SRA
> >>
> http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=viewer&m=data&s=vi
> >> ewer&run=SRR074430
> >> >>>>
> >> >>>> but directly querying the sqlite3 database fails to find it:
> >> >>>>
> >> >>>> sqlite3 -list SRAmetadb.sqlite "select study_accession,
> >> submission_accession, sample_accession, experiment_accession,
> >> run_accession,  sample_alias from sra  where run_accession in
> >> ('SRR031766','SRR031767','SRR074430')"
> >> >>>>
> >>
> SRP001537|SRA010243|SRS008471|SRX014483|SRR031766|S2_DRSC_CG1012
> >> 8_RNAi-1
> >> >>>>
> >>
> SRP001537|SRA010243|SRS008471|SRX014483|SRR031767|S2_DRSC_CG1012
> >> 8_RNAi-1
> >> >>>>
> >> >>>> Can anyone advise me as the origin of this discrepancy, or perhaps
> fix a
> >> misunderstanding I may have in using this resource.
> >> >>>>
> >> >>>> I just downloaded a fresh SRAdbFile...  here is the "Metadata
> associate
> >> with downloaded file:"
> >> >>>>
> >> >>>> c("schema version", "creation timestamp")c("1.0", "2011-09-03
> >> 10:38:16")
> >> >>>>
> >> >>>>
> >> >>>> Below is a full transcript with SessionInfo(), if it helps.
> >> >>>>
> >> >>>> Thanks!
> >> >>>>
> >> >>>> Malcolm Cook
> >> >>>> Computational Biology - Stowers Institute for Medical Research
> >> >>>>
> >> >>>>> library('SRAdb')
> >> >>>>> sqlfile <- getSRAdbFile()
> >> >>>> sqlfile <- getSRAdbFile()
> >> >>>> trying URL
> 'http://gbnci.abcc.ncifcrf.gov/backup/SRAmetadb.sqlite.gz'
> >> >>>> Content type 'text/plain; charset=ISO-8859-1' length 38391904 bytes
> >> (36.6 Mb)
> >> >>>> opened URL
> >> >>>> ==================================================
> >> >>>> downloaded 36.6 Mb
> >> >>>>
> >> >>>> Unzipping...
> >> >>>>
> >> >>>> Metadata associate with downloaded file:
> >> >>>>
> >> >>>> c("schema version", "creation timestamp")c("1.0", "2011-09-03
> >> 10:38:16")
> >> >>>>> sessionInfo()
> >> >>>> sessionInfo()
> >> >>>> R version 2.13.1 (2011-07-08)
> >> >>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> >> >>>>
> >> >>>> locale:
> >> >>>> [1] C
> >> >>>>
> >> >>>> attached base packages:
> >> >>>> [1] stats     graphics  grDevices utils     datasets  methods   base
> >> >>>>
> >> >>>> other attached packages:
> >> >>>> [1] SRAdb_1.6.0    RCurl_1.5-0    bitops_1.0-4.1
> >> graph_1.30.0   RSQLite_0.9-4
> >> >>>> [6] DBI_0.2-5
> >> >>>>
> >> >>>> loaded via a namespace (and not attached):
> >> >>>> [1] Biobase_2.12.2  GEOquery_2.19.2 XML_3.4-0       tools_2.13.1
> >> >>>>> q('no')
> >> >>>> bash-3.2$    sqlite3 -list SRAmetadb.sqlite "select study_accession,
> >> submission_accession, sample_accession, experiment_accession,
> >> run_accession,  sample_alias from sra  where run_accession in
> >> ('SRR031766','SRR031767','SRR074430')"
> >> >>>>  sqlite3 -list SRAmetadb.sqlite "select study_accession,
> >> submission_accession, sample_accession, experiment_accession,
> >> run_accession,  sample_alias from sra  where run_accession in
> >> ('SRR031766','SRR031767','SRR074430')"
> >> >>>>
> >>
> SRP001537|SRA010243|SRS008471|SRX014483|SRR031766|S2_DRSC_CG1012
> >> 8_RNAi-1
> >> >>>>
> >>
> SRP001537|SRA010243|SRS008471|SRX014483|SRR031767|S2_DRSC_CG1012
> >> 8_RNAi-1
> >> >>>>
> >> >>>> _______________________________________________
> >> >>>> Bioc-devel at r-project.org mailing list
> >> >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >> >>>>
> >> >>>
> >> >>> _______________________________________________
> >> >>> Bioc-devel at r-project.org mailing list
> >> >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >> >>>
> >> >>
> >> >> _______________________________________________
> >> >> Bioc-devel at r-project.org mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >> >>
> >> >
> >> > _______________________________________________
> >> > Bioc-devel at r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >> >
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >


More information about the Bioc-devel mailing list