[Bioc-devel] SRAdb missing runs

Cook, Malcolm MEC at stowers.org
Sat Oct 8 05:28:38 CEST 2011


Jack & Sean,

I just checked and found that the latest version of SRAdb released is SRAdb_1.6.0   for R version 2.13.1.

Is there anything I can do to avail myself of you changes short of running with development R/BioC (or putting rewrite rules in my proxy ;)?

Has NCBI acknowledged the issue you reported as being on their side?

I am faced again with this problem, on a different SRA study (this being the 2nd time I've wanted to use SRAdb).

Would you be able to confirm for me that using the XML from EBI fixes the issue for the following study? (Of course, I understand if not)

I find that no rows are returned by
	sqliteQuickSQL(sra_con,'select * from study where study_accession = "SRP004442"')

even though it exists: http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP004442

This study is coming in via GEO, which is NCBI, so why NCBI's own SRA should have it wrong is surprising to me.

Best,

~Malcolm


> -----Original Message-----
> From: yuelin at gmail.com [mailto:yuelin at gmail.com] On Behalf Of Jack Zhu
> Sent: Tuesday, October 04, 2011 4:10 PM
> To: Sean Davis
> Cc: Cook, Malcolm; bioc-devel at r-project.org
> Subject: Re: [Bioc-devel] SRAdb missing runs
> 
> Hi Malcolm,
> 
> Recently one other user also found missing SRA records in the SRAdb
> database.  I looked into the problem and  it looks like the problems
> was with the xml files on the NCBI SRA ftp
> site. So I modified the package and switched the main downloading
> source of the SRA xml files to EBI.  It seems working now.  Please let
> me know if you still see any problems.  Thanks.
> 
> Jack
> 
> 
> On 19 September 2011 08:41, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> > Hi, Malcolm.  I submitted a ticket to SRA.  They have assigned the
> > ticket already.  We'll keep you updated on the outcome as it
> > definitely impacts the utilization of SRA by us (SRAdb) and others.
> >
> > Sean
> >
> >
> > On Mon, Sep 19, 2011 at 8:25 AM, Cook, Malcolm <MEC at stowers.org>
> wrote:
> >> Jack,
> >>
> >> Thanks for the reply.
> >>
> >> I'm actually not that savvy about the internals of SRA and GEO at
> NCBI.  I've cobbled my first submission RNA-SEQ submission to GEO, which in
> turn submits to SRA.  The reads in question are from modEnccode project
> which submits to GEO which submits to SRA.  I've not tried to deconstruct the
> reason why some of these files have gone missing from the XML.   Do you
> think this is something to report to modEncde, GEO, NCBI?
> >>
> >> Cheers,
> >>
> >> Malcolm
> >>
> >> ________________________________________
> >> From: yuelin at gmail.com [yuelin at gmail.com] On Behalf Of Jack Zhu
> [zhujack at mail.nih.gov]
> >> Sent: Friday, September 16, 2011 10:21 PM
> >> To: Cook, Malcolm
> >> Cc: bioc-devel at r-project.org; Sean Davis
> >> Subject: Re: [Bioc-devel] SRAdb missing runs
> >>
> >> Hi Malcolm,
> >>
> >> I am really sorry that I missed your post, but thank you very much for
> >> the report.
> >>
> >> I have reproduced the problem you found.  I did a little bit study, it
> >> looks like the problem of missing runs in the SRAdb is caused by
> >> failure updating of the XML files by the NCBI.
> >>
> >> As you know all the data in the SRAdb is from NCBI SRA XML files,
> >> which are downloaded from the NCBI ftp site
> >> (ftp://ftp.ncbi.nih.gov/sra/Submissions/).  As shown in this page,
> >> http://www.ncbi.nlm.nih.gov/sra/SRX032508, SRR07443 was submitted
> >> through SRA010243. Unfortunately the SRA010243 XML file on the NCBI
> >> ftp site ( ftp://ftp.ncbi.nih.gov/sra/Submissions/SRA010/SRA010243/)
> >> does not include SRR07443 and SRX032508, which is apparently a result
> >> of failure updating of the XML files when new runs/samples were added.
> >>
> >> Malcolm, currently we are looking into new mechanisms to update SRAdb
> >> and hopefully the problem will be fixed soon.
> >>
> >> Thanks again.
> >>
> >> Jack
> >>
> >>
> >>
> >> On 16 September 2011 07:06, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> >>> Sorry, Malcolm.
> >>>
> >>> We'll look into it.  Thanks for the report.
> >>>
> >>> Sean
> >>>
> >>>
> >>> On Wed, Sep 14, 2011 at 5:09 PM, Cook, Malcolm <MEC at stowers.org>
> wrote:
> >>>> Hi Sean, Jack, and fellow SRAdb users,
> >>>>
> >>>> Sean, I failed to cc: you 1st time around.  Perhaps you have a
> suggestion for me....???
> >>>>
> >>>> I remain perplexed as to why selected SRA runs fail to appear in SRAdb.
> >>>>
> >>>> Does anyone else have some experience/advice in this.
> >>>>
> >>>> Thanks much,
> >>>>
> >>>> ~Malcolm
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: Cook, Malcolm
> >>>> Sent: Friday, September 09, 2011 4:15 PM
> >>>> To: 'bioc-devel at r-project.org'; 'zhujack at mail.nih.gov'
> >>>> Subject: SRAdb missing runs
> >>>>
> >>>> Hi Jack and other SRAdb users,
> >>>>
> >>>> I find at least one SRA run missing from the sqlite database obtained
> from a fresh `getSRAdbFile()`
> >>>>
> >>>> SRR074430 is present in the SRA
> http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=viewer&m=data&s=vi
> ewer&run=SRR074430
> >>>>
> >>>> but directly querying the sqlite3 database fails to find it:
> >>>>
> >>>> sqlite3 -list SRAmetadb.sqlite "select study_accession,
> submission_accession, sample_accession, experiment_accession,
> run_accession,  sample_alias from sra  where run_accession in
> ('SRR031766','SRR031767','SRR074430')"
> >>>>
> SRP001537|SRA010243|SRS008471|SRX014483|SRR031766|S2_DRSC_CG1012
> 8_RNAi-1
> >>>>
> SRP001537|SRA010243|SRS008471|SRX014483|SRR031767|S2_DRSC_CG1012
> 8_RNAi-1
> >>>>
> >>>> Can anyone advise me as the origin of this discrepancy, or perhaps fix a
> misunderstanding I may have in using this resource.
> >>>>
> >>>> I just downloaded a fresh SRAdbFile...  here is the "Metadata associate
> with downloaded file:"
> >>>>
> >>>> c("schema version", "creation timestamp")c("1.0", "2011-09-03
> 10:38:16")
> >>>>
> >>>>
> >>>> Below is a full transcript with SessionInfo(), if it helps.
> >>>>
> >>>> Thanks!
> >>>>
> >>>> Malcolm Cook
> >>>> Computational Biology - Stowers Institute for Medical Research
> >>>>
> >>>>> library('SRAdb')
> >>>>> sqlfile <- getSRAdbFile()
> >>>> sqlfile <- getSRAdbFile()
> >>>> trying URL 'http://gbnci.abcc.ncifcrf.gov/backup/SRAmetadb.sqlite.gz'
> >>>> Content type 'text/plain; charset=ISO-8859-1' length 38391904 bytes
> (36.6 Mb)
> >>>> opened URL
> >>>> ==================================================
> >>>> downloaded 36.6 Mb
> >>>>
> >>>> Unzipping...
> >>>>
> >>>> Metadata associate with downloaded file:
> >>>>
> >>>> c("schema version", "creation timestamp")c("1.0", "2011-09-03
> 10:38:16")
> >>>>> sessionInfo()
> >>>> sessionInfo()
> >>>> R version 2.13.1 (2011-07-08)
> >>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> >>>>
> >>>> locale:
> >>>> [1] C
> >>>>
> >>>> attached base packages:
> >>>> [1] stats     graphics  grDevices utils     datasets  methods   base
> >>>>
> >>>> other attached packages:
> >>>> [1] SRAdb_1.6.0    RCurl_1.5-0    bitops_1.0-4.1
> graph_1.30.0   RSQLite_0.9-4
> >>>> [6] DBI_0.2-5
> >>>>
> >>>> loaded via a namespace (and not attached):
> >>>> [1] Biobase_2.12.2  GEOquery_2.19.2 XML_3.4-0       tools_2.13.1
> >>>>> q('no')
> >>>> bash-3.2$    sqlite3 -list SRAmetadb.sqlite "select study_accession,
> submission_accession, sample_accession, experiment_accession,
> run_accession,  sample_alias from sra  where run_accession in
> ('SRR031766','SRR031767','SRR074430')"
> >>>>  sqlite3 -list SRAmetadb.sqlite "select study_accession,
> submission_accession, sample_accession, experiment_accession,
> run_accession,  sample_alias from sra  where run_accession in
> ('SRR031766','SRR031767','SRR074430')"
> >>>>
> SRP001537|SRA010243|SRS008471|SRX014483|SRR031766|S2_DRSC_CG1012
> 8_RNAi-1
> >>>>
> SRP001537|SRA010243|SRS008471|SRX014483|SRR031767|S2_DRSC_CG1012
> 8_RNAi-1
> >>>>
> >>>> _______________________________________________
> >>>> Bioc-devel at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>
> >>>
> >>> _______________________________________________
> >>> Bioc-devel at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >



More information about the Bioc-devel mailing list