[Bioc-devel] Shouldn't we distinguish between package-specific and dependency errors?]

Morgan, Martin Martin.Morgan at roswellpark.org
Thu Sep 24 22:57:34 CEST 2015


> -----Original Message-----
> From: Bioc-devel [mailto:bioc-devel-bounces at r-project.org] On Behalf Of
> Michael Lawrence
> Sent: Thursday, September 24, 2015 2:51 PM
> To: Ludwig Geistlinger
> Cc: bioc-devel at r-project.org
> Subject: Re: [Bioc-devel] Shouldn't we distinguish between package-specific
> and dependency errors?]
> 
> The important question is whether the package actually works, as
> distributed. if not, it's a user matter. If a build is failing because there is a
> problem with the "next" version of the package, or something specific to the
> build machine, it's a developer/admin matter. I'm guessing we don't
> routinely test packages without version bumps, but perhaps we should, at
> least when their deps change. Maybe certain packages that depend on
> external resources could be tested on a regular but less frequent basis,
> regardless.

Packages are built and checked nightly, regardless of version bump. Only version bumps (and successful build / check) trigger a push to the public bioc repository. The build errors that Ludwig is concerned about typically are the result of these nightly builds catching incompatible changes in other packages.

In these cases the bioc packages that _are_ available via biocLite() (because they built before the incompatible change) are no longer valid; it seems it is particularly important to alert the user, including users who have already installed the bioc package, that there are problems. It is not possible to 'role back' the Bioc package (because there is no guarantee that the older version worked, and because R installs newer versions, not older versions). In terms of our hypothetical reviewer, the shield accurately conveys the situation they would experience if they were to install the software.

It might be helpful to remember that the shields on the release and devel pages are independent of one another -- the carnage of a bad check-in of a new feature (in devel, of course!) is not reflected on the release landing pages.

Roughly, I view the top line of shields as particularly useful to users; the second line is more developer oriented but still conveying relevant information to our more ardent users. In both cases I think the shields do a good job of making problems more apparent to the community in general, and hence contribute to better overall software.

There are 'best practices' that package developers can follow to mitigate the consequences of API changes in their package, especially following a strict deprecation cycle; the separation of 'release' and 'devel' versions of Bioconductor facilitate this. Likewise, package developers have a responsibility to their users to convey problems 'upstream' to be fixed at the source.

Bioconductor does have a more dense dependency graph than CRAN. Generally I think this is good, reflecting valuable software re-use rather than re-invention; the release / devel split also makes this approach viable when the dependencies are within Bioconductor. It is unfortunate when a domain specific package offers some functionality that is more generally useful, introducing a cascade of more-or-less irrelevant dependencies. In these cases it may well be worth-while to re-factor or identify the generally useful functionality into a new or different package, e.g., implementing or using rtracklayer::import(). If there are candidates for such re-factoring then the Bioc-devel mailing list is an appropriate venue for discussion.

Martin

> 
> 
> On Thu, Sep 24, 2015 at 11:19 AM, Ludwig Geistlinger <
> Ludwig.Geistlinger at bio.ifi.lmu.de> wrote:
> 
> > Dan, thanks for clarifying.
> > With 'we can hardly do much about it', I meant that we cannot prevent
> > that for external dependencies in the way we can prevent it for
> > dependendencies within Bioc.
> >
> > Question remains whether the landing page for the USER of the package
> > is the right place to alert the DEVELOPER of the package.
> >
> > Best,
> > Ludwig
> >
> >
> > ----- Original Message -----
> > > From: "Ludwig Geistlinger" <Ludwig.Geistlinger at bio.ifi.lmu.de>
> > > To: "Dan Tenenbaum" <dtenenba at fredhutch.org>
> > > Sent: Thursday, September 24, 2015 10:52:29 AM
> > > Subject: Re: [Bioc-devel] Shouldn't we distinguish between
> > package-specific and dependency errors?
> > >
> > >
> > > Well, I guess, Dan, that basically means that breaking cannot happen
> > > within Bioc (as broken packages do not propagate to the repository)
> > > and such cases are exclusively due to breaking of external
> > > dependencies such as observed with KEGGREST and KEGG (where we
> can
> > > hardly do much about it).
> > >
> > >
> > > Thus, it remains to clarify on the purpose of the ‚build‘ shield
> > > as Wolfgang pointed out.
> > > While it is surely helpful for the developer to grasp what is going
> > > on at a glance, this might be misleading for users and reviewers as
> > > described earlier.
> > >
> > >
> >
> > The purpose of the build shield is to alert you to the fact that the
> > build is broken. If the build is broken due to a dependency, it's not
> > true that there is nothing you can do about it; as Michael points out,
> > you can alert the maintainer of the broken package or you can (as I
> > did) contact KEGG who promptly fixed their issue. This benefits the
> community as a whole.
> >
> > There are other types of dependency-related errors, for example if a
> > package you depend on changes its API and you do not adapt to those
> > changes, your package will break, but YOU need to fix your package,
> > nobody else's package needs to change.
> >
> > I think it is exceedingly difficult to determine programmatically
> > whether a given failure was caused by a dependency or by the package
> > itself, and I'm not sure it's a good idea to try.
> >
> > I recognize that it can be bad for a reviewer to see the red build shield.
> > But the purpose is to alert the DEVELOPER to problems and I would
> > reiterate that there is always something you as the package author can
> > do, whether it's alerting the upstream developer to the problem, or if
> > that doesn't work, removing the dependency.
> >
> > Dan
> >
> >
> > > Ludwig
> > >
> > >
> > >
> > >
> > >
> > >
> > > Am 24.09.2015 um 19:31 schrieb Dan Tenenbaum <
> > > dtenenba at fredhutch.org
> > > >:
> > >
> > >
> > >
> > > ----- Original Message -----
> > >
> > >
> > > From: "Andrzej OleÅ›" < andrzej.oles at gmail.com >
> > > To: "Dan Tenenbaum" < dtenenba at fredhutch.org >
> > > Cc: bioc-devel at r-project.org , "Wolfgang Huber" < whuber at embl.de >
> > > Sent: Thursday, September 24, 2015 10:28:14 AM
> > > Subject: Re: [Bioc-devel] Shouldn't we distinguish between
> > > package-specific and dependency errors?
> > >
> > >
> > >
> > >
> > >
> > > Hi Dan,
> > >
> > > thank you for clarifying! I had this impression after looking at
> > >
> > > http://bioconductor.org/checkResults/devel/bioc-LATEST/flowcatchR/
> > > and
> > >
> > http://bioconductor.org/checkResults/devel/bioc-
> LATEST/GenomicInteract
> > ions/
> > >
> > > which both produce errors during R CMD check, nevertheless, these
> > > problematic versions are available on the corresponding package
> > > landing pages. Probably that's because the package started failing
> > > check only sometime after the update...
> > >
> > >
> > > Yes, that is probably what happened. Also, a maintainer can change a
> > > package without bumping the version number. In this case, even if
> > > the package builds and checks, it will not be propagated since there
> > > was no version bump.
> > >
> > > Dan
> > >
> > >
> > >
> > >
> > >
> > > Best,
> > > Andrzej
> > >
> > >
> > >
> > > On Thu, Sep 24, 2015 at 6:12 PM, Dan Tenenbaum <
> > > dtenenba at fredhutch.org > wrote:
> > >
> > >
> > >
> > >
> > > ----- Original Message -----
> > >
> > >
> > > From: "Andrzej OleÅ›" < andrzej.oles at gmail.com >
> > > To: "Wolfgang Huber" < whuber at embl.de >
> > > Cc: bioc-devel at r-project.org
> > > Sent: Thursday, September 24, 2015 5:56:20 AM
> > > Subject: Re: [Bioc-devel] Shouldn't we distinguish between
> > > package-specific and dependency errors?
> > >
> > > Hi,
> > >
> > > we need to distinguish here between build/install and check errors.
> > > The
> > > first ones hold the package update (instead, the last working
> > > version is used). On the other hand, check errors do not hold the
> > > package from propagating into the repository causing collateral
> > > damage (at least that's what I observe in the devel branch).
> > >
> > >
> > > If a package does not pass R CMD check, it does not propagate into
> > > the repository.
> > >
> > > Dan
> > >
> > >
> > >
> > >
> > >
> > >
> > > A good example is EBImage which is currently broken for all
> > > architectures but Linux (see:
> > > http://bioconductor.org/checkResults/devel/bioc-LATEST/EBImage/ ).
> > > It
> > > doesn't affect it's downstream dependencies because the error occurs
> > > at build stage, see for example imageHTS (
> > > http://bioconductor.org/packages/3.2/bioc/html/imageHTS.html ).
> > > Fair
> > > enough,
> > > EBImage has a red badge, whereas imageHTS has a green one.
> > >
> > > So the issue raised by Ludwig occurs only with packages which fail
> > > during check. Maybe changing the publication policy in such cases,
> > > i.e.
> > > hold
> > > the
> > > updated package from going into the repository when it fails 'R CMD
> > > check'
> > > would help to address the problem, at least for BioC packages?
> > >
> > > Best,
> > > Andrzej
> > >
> > > On Thu, Sep 24, 2015 at 2:22 PM, Wolfgang Huber < whuber at embl.de >
> > > wrote:
> > >
> > >
> > >
> > > It seems that the “build† shield on the package landing page
> > > conflates things that happen in the package, and in its
> > > dependencies.
> > > Do we have a clear spec of what the purpose of that shield is?
> > >
> > > Something to avoid IMHO is creating incentives for package
> > > developers to reduce dependencies to make their package “look"
> > > more robust, at the cost of duplication or functionality.
> > >
> > > Wolfgang
> > >
> > >
> > >
> > >
> > > On 24 Sep 2015, at 14:13, Ludwig Geistlinger <
> > > Ludwig.Geistlinger at bio.ifi.lmu.de > wrote:
> > >
> > >
> > >
> > >
> > >
> > > Do you have any information on how often this kind of breakage
> > > occurs?
> > >
> > > Having my package ~1 year in, I would say that happened roughly
> > > 5
> > > times
> > > to
> > >
> > >
> > > me.
> > >
> > > I wonder whether other developers could comment on their experience
> > > with that as well.
> > >
> > >
> > >
> > >
> > > On Thu, Sep 24, 2015 at 4:35 AM, Ludwig Geistlinger <
> > > Ludwig.Geistlinger at bio.ifi.lmu.de > wrote:
> > >
> > >
> > >
> > > Dear Bioc-Team,
> > >
> > > I would like to make this point brought up by Weijun more general.
> > > He reported a considerable number of packages to be broken by
> > > (recursively) depending on KEGGREST - which actually broke due to
> > > KEGG itself (however, this seems to be resolved by the current
> > > build).
> > >
> > > Nevertheless, given that a dependency can break your package at any
> > > time, it is currently hard to device a robust and stable software
> > > product even
> > >
> > >
> > >
> > >
> > >
> > >
> > > within the semi-annual release.
> > >
> > >
> > > Do you have any information on how often this kind of breakage
> > > occurs?
> > >
> > >
> > >
> > >
> > > Thus, I wonder whether Bioc packages in release (at least those
> > > having other packages depending on them) shouldn't always be rolled
> > > back to the
> > >
> > >
> > >
> > >
> > >
> > >
> > > last version that passed build and check without error, in order to
> > > ensure functioning of packages down the hierarchy.
> > >
> > > Based on these considerations, I also wonder whether the shield on
> > > the package landing page indicating the result of the package
> > > building
> > > (ok/warning/error) shouldn't distinguish between errors caused by
> > > dependencies and errors caused by the package itself.
> > >
> > > Imagine the not too unrealistic case of a new Bioc package presented
> > > in a Software article under review.
> > > Without doubt, a reviewer will be negatively influenced by the
> > > 'error'
> > > shield indicating that the package has not been properly worked out.
> > > This is fair enough if the package's own code produces these bugs,
> > > but the opposite it true if that is due to a broken dependency.
> > >
> > >
> > > Recent developments at the Volkswagen company should help raise
> > > general awareness that software development and maintenance is a
> > > fraught process.
> > >
> > >
> > >
> > >
> > > If
> > > software S depends on software T and T is unreliable then so is S.
> > > The negative influence of events of the sort you describe has
> > > potential value.
> > >
> > >
> > >
> > >
> > >
> > > I believe there are ways of using containers so that software can be
> > > distributed in a verified working state, perhaps suitable for a
> > > fully predictable review, but I doubt this is a real solution to the
> > > actual problem.
> > >
> > >
> > >
> > >
> > >
> > > In the worst case, the package will run fine the whole time the
> > > article is prepared, but breaks due to a broken dependency the day
> > > the reviewer starts to evaluate the manuscript.
> > >
> > > I know that this does not resolves problems of dependencies outside
> > > of BioC such as for KEGGREST with KEGG.
> > > But at least for dependencies within BioC, I wonder whether this is
> > > a point worth considering.
> > >
> > > Thanks & Best,
> > > Ludwig
> > >
> > >
> > > --
> > > Dipl.-Bioinf. Ludwig Geistlinger
> > >
> > > Lehr- und Forschungseinheit für Bioinformatik Institut für
> > > Informatik Ludwig-Maximilians-Universität München
> > > Amalienstrasse 17, 2. Stock, Büro A201
> > > 80333 München
> > >
> > > Tel.: 089-2180-4067
> > > eMail: Ludwig.Geistlinger at bio.ifi.lmu.de
> > >
> > >
> > >
> > >
> > > Hi Weijun,
> > >
> > > ----- Original Message -----
> > >
> > >
> > > From: "Luo Weijun" < luo_weijun at yahoo.com >
> > > To: maintainer at bioconductor.org , dtenenba at fredhutch.org
> > > Cc: "Martin Morgan" < mtmorgan at fredhutch.org >,
> > > Bioc-devel at r-project.org
> > >
> > >
> > >
> > >
> > > Sent: Wednesday, September 23, 2015 9:44:13 AM
> > > Subject: KEGG REST issue
> > >
> > > Dear BioC team,
> > > I noticed some problem with keggLink() function of KEGGREST package,
> > > and it can be traced back to KEGG REST API Linked entries.
> > > Some of
> > > this API function is broken. For example, the following line used to
> > > get all gene-pathway mapping for human, but retrieves nothing now.
> > > path.hsa= KEGGREST::keggLink("pathway", "hsa")
> > >
> > > In fact, these two bulk queries with the rest api
> > > don’t work anymore.
> > >
> > >
> > >
> > >
> > > http://rest.kegg.jp/link/pathway/hsa
> > > http://rest.kegg.jp/link/hsa/pathway
> > > but smaller queries on Linked entries seem to be fine. not sure
> > > whether other REST API functions are affected or not. As a
> > > consequence, KEGGREST and many dependent packages had build error.
> > >
> > >
> > >
> > http://bioconductor.org/checkResults/release/bioc-
> LATEST/KEGGREST/more
> > lia-buildsrc.html
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > anway, just want you know about this, see if you can do anything on
> > > this.
> > >
> > > Yes, I am aware of this. It's an issue on the KEGG side and I have
> > > contacted the KEGG team. I have not heard back yet.
> > >
> > > Dan
> > >
> > >
> > >
> > >
> > > Weijun
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioc-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> > >
> > > _______________________________________________
> > > Bioc-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> > >
> > >
> > >
> > > --
> > > Dipl.-Bioinf. Ludwig Geistlinger
> > >
> > > Lehr- und Forschungseinheit für Bioinformatik Institut für
> > > Informatik Ludwig-Maximilians-Universität München Amalienstrasse
> > > 17, 2. Stock, Büro A201
> > > 80333 München
> > >
> > > Tel.: 089-2180-4067
> > > eMail: Ludwig.Geistlinger at bio.ifi.lmu.de
> > >
> > > _______________________________________________
> > > Bioc-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> > > _______________________________________________
> > > Bioc-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> > >
> > > [[alternative HTML version deleted]]
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioc-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioc-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> >
> >
> > --
> > Dipl.-Bioinf. Ludwig Geistlinger
> >
> > Lehr- und Forschungseinheit für Bioinformatik Institut für Informatik
> > Ludwig-Maximilians-Universität München Amalienstrasse 17, 2. Stock,
> > Büro A201
> > 80333 München
> >
> > Tel.: 089-2180-4067
> > eMail: Ludwig.Geistlinger at bio.ifi.lmu.de
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.


More information about the Bioc-devel mailing list