[Bioc-devel] A handful of check to follow up on R CMD BiocCheck
Martin Morgan
martin.morgan at roswellpark.org
Thu Nov 10 12:22:48 CET 2016
On 11/10/2016 05:36 AM, Kevin RUE wrote:
> Hi Martin,
>
> Thanks for spending the time and effort to refactor and integrate my
> code into BiocCheck. I didn't intend to give you so much to think about !
>
> For the record, I used the "checkFunctionLengths" method (now starting
> at line 829 of checks.R) as a template to offer code somewhat homogenous
> with what I could see in place already. I believe that function should
> also be refactored using your new "Context" and "handleContext" helpers,
> right?
yep
>
> Side note:
> Far from trying to compete with the BiocCheck Bioconductor package, I
> couldn't help but update my "follow-up" "BiocCheckTools" Python code
I think the right place for this detailed stuff is actually in lintr and
/ or formatR.
> <https://github.com/kevinrue/BiocCheckTools/blob/master/README.md> in
> the meantime (sorry for unoriginal name, I'll take suggestions!) taking
> Kasper's advice on board:
>
> * Single master file which runs everything and gives a report.
> o *Bonus*: each module can be run separately (line_chars; tab_width).
> * All line numbers are reported
> * Content of the line is not displayed
> * I didn't mark up TABs, but I indicate how many spaces they are made
> of (when not a multiple of 4)
>
> All the best,
> Kevin
>
>
> On Wed, Nov 9, 2016 at 11:31 PM, Martin Morgan
> <martin.morgan at roswellpark.org <mailto:martin.morgan at roswellpark.org>>
> wrote:
>
> On 11/03/2016 08:14 AM, Kevin RUE wrote:
>
> Apologies for the additional spam, for two reasons:
>
> * The diff files that I've previously sent had the base and
> modified
> versions swapped. This new one fixes that.
> * This new diff file (always relative to the code I cloned from
> Bioconductor-mirror) also fixes a bug whereby the updated
> code would
> fail on packages that do not break any of the three guidelines.
>
>
> Hi Kevin -- thanks for these, I incorporated a version in 1.11.1.
>
> The report is for the first 6 lines of offenders; the code for
> finding all offenders is mentioned in the vignette
>
> BiocCheck:::checkFormatting("path/to/YourPackage", nlines=Inf)
>
> Marcel Ramos mentions the lintr (CRAN) and goodpractice (github
> only, https://github.com/MangoTheCat/goodpractice
> <https://github.com/MangoTheCat/goodpractice>) packages as useful
> resources for these types of questions; also formatR (CRAN).
>
> For the record, it was fun to work through your code. The basic
> pattern was to read lines and flag those that violated a (usually
> simple) test, add these to a data frame, then if after parsing all
> files the data frame had any rows report these to the user.
>
> I standardized the columns of the data frame across checks to
> facilitate re-use, and imposed the standard by using a constructor
> -- non-exported Context(). I took a little bit of pain to make the
> constructor work (return a data frame with the correct columns but
> no rows) when there are no lines of code flagged to be added to the
> context, so that I could use it without have to insert a bunch of
> conditional `if (...)` in my code that make it both harder to
> understand and harder to test. Also, the constructor is vectorized,
> whereas your implementation was iterative.
>
> For reporting to the user, I wrote a small helper handleContext()
> that I could re-use across the different types of checks (and likely
> elsewhere in the package) and also maintain the separation of what
> is reported to the user (e.g., in handleContext()) from how it is
> reported to the user -- .msg, .verbatim, etc in the utils.R file.
> Probably the reporting could have been refactored further.
>
> I feel guilty about not writing unit tests; the package does include
> tests, including a handy package-generator function to make packages
> that are invalid in various ways. It seems like a better way to go
> would be to further refactor the body of the checkFormatting()
> function so that one could for instance invoke a function
> checkLineLengths(pkg, file, lines) and get back a data.frame created
> by Context(); it would then be easy to test these for various
> 'lines' without having to figure out how to make test packages. But
> this seemed of course like too much additional work for this feature
> (I'll probably regret not expending the effort at a future date).
>
> I also found myself using the horrible 'copy-and-append' paradigm
>
> ctxt = Context()
> for (fl in files) {
> ...
> ctxt = rbind(ctxt, Context(...)
> }
>
> which makes a copy of (the increasingly large) ctxt each time
> through the loop and therefore scales quadratically (when each file
> has some problems) with the number of files. I'm counting on the
> number of files to be small (e.g., less than 10000, when the
> copy-and-append pattern starts to be expensive even for trivial
> operations; mzR has ~8750 files, though many of these are not
> checked for formatting) so that this pattern does not become a
> bottleneck. It was still painful to use...
>
> Martin
>
>
> Best,
> Kevin
>
>
> On Thu, Nov 3, 2016 at 11:49 AM, Kevin RUE <kevinrue67 at gmail.com
> <mailto:kevinrue67 at gmail.com>
> <mailto:kevinrue67 at gmail.com <mailto:kevinrue67 at gmail.com>>> wrote:
>
> Hi all,
>
> Please find attached the diff relative to the code that I cloned
> from Bioconductor-mirror yesterday (please ignore the
> previous diff
> file).
>
> Basically three new features:
>
> * As per previous email: display up to the first 6 lines
> that are
> over 80 characters long
> * *New*: display up to the first 6 lines that are not
> indented by
> a multiple of 4 spaces
> * *New*: display up to the first 6 lines that use TAB
> instead of 4
> spaces for indentation
>
> I also attach the output of the updated code
> <https://github.com/kevinrue/BiocCheck
> <https://github.com/kevinrue/BiocCheck>>, to illustrate the changes.
>
> Notes:
>
> * For demonstration purpose, I indented a handful of lines
> from
> the checks.R file itself with TAB characters. I assume
> that's
> OK, as some lines were already longer than 80 characters
> and not
> indented by a multiple of 4 spaces.
>
> All the best,
> Kevin
>
>
> On Wed, Nov 2, 2016 at 10:00 PM, Kevin RUE
> <kevinrue67 at gmail.com <mailto:kevinrue67 at gmail.com>
> <mailto:kevinrue67 at gmail.com <mailto:kevinrue67 at gmail.com>>>
> wrote:
>
> Me again :)
>
> Please find attached the first patch to print the first
> 6 lines
> over 80 characters long. (I'll get to the tabulation
> offenders
> next).
>
> Note that all the offending lines are stored in the
> "df.length"
> data.frame. How about an option like
> "fullReport=c(FALSE, TRUE)"
> that print *all* the offending lines?
> The data.frame also stores the content of the lines for the
> record, but does not print them. I think Kasper is right:
> filename and line should be enough to track down the line.
>
> All the best,
> Kevin
>
>
>
> On Wed, Nov 2, 2016 at 8:08 PM, Kevin RUE
> <kevinrue67 at gmail.com <mailto:kevinrue67 at gmail.com>
> <mailto:kevinrue67 at gmail.com
> <mailto:kevinrue67 at gmail.com>>> wrote:
>
> Thanks for the feedback!
>
> I also tend to prefer *all* the lines being reported
> (or to
> be honest, that was really true when I had lots of
> them; a
> problem that I largely mitigated by fixing all of
> them once
> and subsequently paying more attention while
> developing).
>
> Printing the content of the offending line somewhat
> helps me
> spot the line faster (more so for tab issues). But I
> must
> admit that showing the whole line is somewhat
> "overkill". I
> just started thinking of a compromise being to only
> show the
> first N characters of the line, with N being 80
> minus the
> number of characters necessary to print the filename and
> line number.
>
> Thanks Martin for pointing out the lines in
> BiocCheck. (Now
> I feel bad for not having checked sooner.. hehe!)
> I think the idea of BiocCheck showing the first 6
> offenders
> in BiocCheck quite nice, as I rarely have more since
> I use
> using the RStudio "Tools > Global Options > Code >
> Display >
> Show Margin > Margin column: 80" feature.
>
> I'll give a go at both approaches (developing
> BiocCheck and
> my own scripts)
>
> Cheers,
> Kevin
>
>
> On Wed, Nov 2, 2016 at 7:41 PM, Kasper Daniel Hansen
> <kasperdanielhansen at gmail.com
> <mailto:kasperdanielhansen at gmail.com>
> <mailto:kasperdanielhansen at gmail.com
> <mailto:kasperdanielhansen at gmail.com>>> wrote:
>
> I would prefer all line numbers reported, but on the
> other hand I am indifferent wrt. the content of the
> line, unless (say) TABs are marked up somehow.
>
> Kasper
>
> On Wed, Nov 2, 2016 at 3:17 PM, Martin Morgan
> <martin.morgan at roswellpark.org
> <mailto:martin.morgan at roswellpark.org>
> <mailto:martin.morgan at roswellpark.org
> <mailto:martin.morgan at roswellpark.org>>> wrote:
>
> On 11/02/2016 02:49 PM, Kevin RUE wrote:
>
> Dear all,
>
> Just thought I'd share a handful of
> scripts that
> I wrote to follow up on
> certain NOTE messages thrown by R CMD
> BiocCheck.
>
>
> https://github.com/kevinrue/BiocCheckTools
> <https://github.com/kevinrue/BiocCheckTools>
>
> <https://github.com/kevinrue/BiocCheckTools
> <https://github.com/kevinrue/BiocCheckTools>>
>
> They're very simple, but I occasionally find
> them quite convenient.
> Apologies if something similar already
> exists
> somewhere :)
>
>
> Maybe consider creating a diff against the
> source
> code that, e.g., reported the first 6
> offenders? The
> relevant lines are near
>
>
> https://github.com/Bioconductor-mirror/BiocCheck/blob/master/R/checks.R#L1081
> <https://github.com/Bioconductor-mirror/BiocCheck/blob/master/R/checks.R#L1081>
>
> <https://github.com/Bioconductor-mirror/BiocCheck/blob/master/R/checks.R#L1081
> <https://github.com/Bioconductor-mirror/BiocCheck/blob/master/R/checks.R#L1081>>
>
> Martin
>
>
> All the best,
> Kevin
>
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioc-devel at r-project.org
> <mailto:Bioc-devel at r-project.org>
> <mailto:Bioc-devel at r-project.org
> <mailto:Bioc-devel at r-project.org>> mailing list
>
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>
>
> This email message may contain legally
> privileged
> and/or...{{dropped:2}}
>
>
> _______________________________________________
> Bioc-devel at r-project.org
> <mailto:Bioc-devel at r-project.org>
> <mailto:Bioc-devel at r-project.org
> <mailto:Bioc-devel at r-project.org>> mailing list
>
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>
>
>
>
>
>
>
> This email message may contain legally privileged and/or
> confidential information. If you are not the intended recipient(s),
> or the employee or agent responsible for the delivery of this
> message to the intended recipient(s), you are hereby notified that
> any disclosure, copying, distribution, or use of this email message
> is prohibited. If you have received this message in error, please
> notify the sender immediately by e-mail and delete this email
> message from your computer. Thank you.
>
>
This email message may contain legally privileged and/or...{{dropped:2}}
More information about the Bioc-devel
mailing list