[Bioc-devel] Bioconductor archive?
Lluís Revilla
||u|@@rev|||@ @end|ng |rom gm@||@com
Thu Mar 20 21:25:00 CET 2025
Thanks Sean,
This looks awesome! Many thanks for storing this. I'll see how I could
process the data and might contact you off-list or via the issues on
the repo.
Just by the numbers reported I'm a bit surprised by the daily
increment of the summary table. Bioconductor software has around 2000
packages, checked on 5 different machines, per 5 outputs (Install,
build, check, bin, propagate) (which results on that order of
magnitudes), but not all builds and checks are run everyday (now I
cannot find the page where the frequency is reported).
At the moment I won't use build and check reports but I might be
interested in that later (I too collect general checks results from
CRAN without the log files).
In any case, I'll get in touch.
Ideally, I would like to export/use this from a package, as I have
done for CRAN via the repo.data package I'm building.
Best wishes and many thanks,
Lluís
On Thu, 20 Mar 2025 at 02:56, Sean Davis <seandavi using gmail.com> wrote:
>
> Hi, all.
>
>
>
> Perhaps a bit tangential, but I capture the results of all build reports for all packages daily (that is the intent, anyway) going back a year or so (a couple of years if we dig into archives). The reports are processed using code in this repo: https://github.com/seandavi/BiocBuildDB using a github action that runs daily. This might not be exactly the format you are looking for, Lluis, but it does have a complete history of every build for every package for every day for all Bioc builds.
>
>
>
> The result is a set of three CSV files (one set for every build, about 3.5k CSV files right now) with rows for each package/machine/build step and the results of the build, including propagation status (whether the package gets pushed to release). Version numbers, git hashes, dates, Bioconductor versions, build commands, error logs, etc. are all captured. Thus, things like full text search over captured log output is possible over time, across branches, and across machines or packages. When a package enters the system is captured. The build_summary table currently checks in at about 6M rows (again, without going into archive data) and adds about 20k rows per day.
>
>
>
> I have pending issues to expose the data but just haven’t prioritized the work. I’m happy to discuss access and use cases either in a new thread here, on Slack, or via github issues.
>
>
>
> Sean
>
>
>
>
>
>
>
> From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of Lluís Revilla <lluis.revilla using gmail.com>
> Date: Wednesday, March 19, 2025 at 6:21 PM
> To: Kern, Lori <Lori.Shepherd using roswellpark.org>
> Cc: bioc-devel <bioc-devel using r-project.org>
> Subject: Re: [Bioc-devel] Bioconductor archive?
>
> Hi Lori,
>
> Many thanks for your answer. I have a couple of follow-up questions.
>
> > It looks like the Date/Publication field is only present when there was a change on the branch post release. (ie. any package that has a version x.y.(z+n) instead of x.y.0.
> > After a release is frozen and a new release occurs, Bioconductor does not allow any changes or fixes even to bugs. A release is frozen so there is no changes after the new release occurs.
>
> Thanks for reminding me of this. I'm interested on the x.y.z+n
> packages that were released on each release, not just the last one or
> the initial one. Is this historical information available? The file at
> https://bioconductor.org/packages/3.20/bioc/VIEWS only includes the
> latest date of a given release, but there could be a release within a
> given Bioconductor version before that.
>
> > I would have to dig in the history but my guess is 3.7 might be when we either switched to git or started having archived versions so likely not available before this date.
>
> I thought it would be difficult if not impossible to check this but
> even for the current release I can't find this data. Does Bioconductor
> have an internal archive with this information? On CRAN even if it
> removes a package internally the activities of the archive are
> stored: each date-time of publication, archive and removal. Does
> something similar happen in Bioconductor? Even if a given package is
> not available knowing that there was a release could be helpful for
> reproducibility (as it could be used to compare with the git log).
>
> With that information finding which package versions were used for a
> script with only a date could become easier.
>
> Best,
>
> Lluís
>
>
> >
> >
> >
> > Lori Shepherd - Kern
> >
> > Bioconductor Core Team
> >
> > Roswell Park Comprehensive Cancer Center
> >
> > Department of Biostatistics & Bioinformatics
> >
> > Elm & Carlton Streets
> >
> > Buffalo, New York 14263
> >
> > ________________________________
> > From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of Lluís Revilla <lluis.revilla using gmail.com>
> > Sent: Saturday, March 15, 2025 5:20 AM
> > To: bioc-devel <bioc-devel using r-project.org>
> > Subject: [Bioc-devel] Bioconductor archive?
> >
> > Hi,
> >
> > Recently I learned thanks to Martin Morgan that there are some files with
> > the Date/Publication fields for Bioconductor packages:
> > https://secure-web.cisco.com/1WmVHwH9-fASq-_cRqjzutLif_scf2tV0oia7j9wcAlmEkD6LTfPr4hpDabt4CAjYBdFcUrtqQXG2zbH0HakIsmTnqgnHUbghB0qC_b3FyGAhL5dnDBbz1Oh7HlpVwyPV79vgW7FMsg__zeInCyPb_jmFBXAvFRuq-HsBLTAC-Bf2EfgTjG3y38kBOIGnb59DWA6ILkuC-oYK0RJe8h3JvV5RoaeA9FxDk6QokHUT-YeC7hIEd_hURH1dV0dKbJN717qRcgwyT42SNb1evj91AQrxGnEyIR2XFpm28A-qOih3N2V_YsWsZd0wzGApXcZy/https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.7%2Fbioc%2FVIEWS I'm trying to reconstruct
> > which packages from CRAN and Biocondctor were available at any moment and
> > it was very helpful.
> >
> > However, these files have the latest version published by a package on a
> > given Bioconductor release.
> > Is there a way to know if there were more updates after a release?
> > I thought about searching the git log for each package. But that wouldn't
> > be enough, as they might have increased their version but not passed
> > Bioconductor checks, and thus not be released.
> >
> > Related to this, this field is present from Bioconductor version 3.7 or
> > later but I couldn't find it on previous releases. Is there a way to know
> > previous packages' releases and their dates?
> >
> > Packages' updates on the release branch should on contain bug fixes, but
> > for reproducibility purposes it might be necessary to get the same bugs
> > again.
> >
> > Many thanks in advance,
> >
> > Lluís
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel using r-project.org mailing list
> > https://secure-web.cisco.com/13SnGNaaDyFbctEb1TdAguAxRDGWtUJvQINgKyoWwg8r1Kce77xQNycHZxQSYbLF7m6L2z5y7dVIwm3y-9U1nxiyuzrQxuIQZc5HoTMPvbokKA1qJHn3CCb-Zlx3gtXWIW2VtFh_7loh_SYeLpi5ak38PFBFkLutgGFEwFhXbr0EFIo2W8HRtaqFNH9_U-hcBauAVzEJOJV9rFuxZom3twTGLLjMzaXn7ZhRdcG56Z_sAM0lzgdFeTgepY4mN7XAUwqNMoSSwjIeL10YspawZ6fy_yXLfIysgSN1DpVVdzc9Pv7GHlPjj7-EVYr-ScNbg/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fbioc-devel
> >
> >
> > This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list