[Bioc-devel] A handful of check to follow up on R CMD BiocCheck

Thu Nov 10 12:22:48 CET 2016

On 11/10/2016 05:36 AM, Kevin RUE wrote:
> Hi Martin,
>
> Thanks for spending the time and effort to refactor and integrate my
> code into BiocCheck. I didn't intend to give you so much to think about !
>
> For the record, I used the "checkFunctionLengths" method (now starting
> at line 829 of checks.R) as a template to offer code somewhat homogenous
> with what I could see in place already. I believe that function should
> also be refactored using your new "Context" and "handleContext" helpers,
> right?

yep

>
> Side note:
> Far from trying to compete with the BiocCheck Bioconductor package, I
> couldn't help but update my "follow-up" "BiocCheckTools" Python code

I think the right place for this detailed stuff is actually in lintr and 
/ or formatR.

> <https://github.com/kevinrue/BiocCheckTools/blob/master/README.md> in
> the meantime (sorry for unoriginal name, I'll take suggestions!) taking
> Kasper's advice on board:
>
>   * Single master file which runs everything and gives a report.
>       o *Bonus*: each module can be run separately (line_chars; tab_width).
>   * All line numbers are reported
>   * Content of the line is not displayed
>   * I didn't mark up TABs, but I indicate how many spaces they are made
>     of (when not a multiple of 4)
>
> All the best,
> Kevin
>
>
> On Wed, Nov 9, 2016 at 11:31 PM, Martin Morgan
> <martin.morgan at roswellpark.org <mailto:martin.morgan at roswellpark.org>>
> wrote:
>
>     On 11/03/2016 08:14 AM, Kevin RUE wrote:
>
>         Apologies for the additional spam, for two reasons:
>
>           * The diff files that I've previously sent had the base and
>         modified
>             versions swapped. This new one fixes that.
>           * This new diff file (always relative to the code I cloned from
>             Bioconductor-mirror) also fixes a bug whereby the updated
>         code would
>             fail on packages that do not break any of the three guidelines.
>
>
>     Hi Kevin -- thanks for these, I incorporated a version in 1.11.1.
>
>     The report is for the first 6 lines of offenders; the code for
>     finding all offenders is mentioned in the vignette
>
>       BiocCheck:::checkFormatting("path/to/YourPackage", nlines=Inf)
>
>     Marcel Ramos mentions the lintr (CRAN) and goodpractice (github
>     only, https://github.com/MangoTheCat/goodpractice
>     <https://github.com/MangoTheCat/goodpractice>) packages as useful
>     resources for these types of questions; also formatR (CRAN).
>
>     For the record, it was fun to work through your code. The basic
>     pattern was to read lines and flag those that violated a (usually
>     simple) test, add these to a data frame, then if after parsing all
>     files the data frame had any rows report these to the user.
>
>     I standardized the columns of the data frame across checks to
>     facilitate re-use, and imposed the standard by using a constructor
>     -- non-exported Context(). I took a little bit of pain to make the
>     constructor work (return a data frame with the correct columns but
>     no rows) when there are no lines of code flagged to be added to the
>     context, so that I could use it without have to insert a bunch of
>     conditional `if (...)` in my code that make it both harder to
>     understand and harder to test. Also, the constructor is vectorized,
>     whereas your implementation was iterative.
>
>     For reporting to the user, I wrote a small helper handleContext()
>     that I could re-use across the different types of checks (and likely
>     elsewhere in the package) and also maintain the separation of what
>     is reported to the user (e.g., in handleContext()) from how it is
>     reported to the user -- .msg, .verbatim, etc in the utils.R file.
>     Probably the reporting could have been refactored further.
>
>     I feel guilty about not writing unit tests; the package does include
>     tests, including a handy package-generator function to make packages
>     that are invalid in various ways. It seems like a better way to go
>     would be to further refactor the body of the checkFormatting()
>     function so that one could for instance invoke a function
>     checkLineLengths(pkg, file, lines) and get back a data.frame created
>     by Context(); it would then be easy to test these for various
>     'lines' without having to figure out how to make test packages. But
>     this seemed of course like too much additional work for this feature
>     (I'll probably regret not expending the effort at a future date).
>
>     I also found myself using the horrible 'copy-and-append' paradigm
>
>        ctxt = Context()
>        for (fl in files) {
>            ...
>            ctxt = rbind(ctxt, Context(...)
>         }
>
>     which makes a copy of (the increasingly large) ctxt each time
>     through the loop and therefore scales quadratically (when each file
>     has some problems) with the number of files. I'm counting on the
>     number of files to be small (e.g., less than 10000, when the
>     copy-and-append pattern starts to be expensive even for trivial
>     operations; mzR has ~8750 files, though many of these are not
>     checked for formatting) so that this pattern does not become a
>     bottleneck. It was still painful to use...
>
>     Martin
>
>
>         Best,
>         Kevin
>
>
>         On Thu, Nov 3, 2016 at 11:49 AM, Kevin RUE <kevinrue67 at gmail.com
>         <mailto:kevinrue67 at gmail.com>
>         <mailto:kevinrue67 at gmail.com <mailto:kevinrue67 at gmail.com>>> wrote:
>
>             Hi all,
>
>             Please find attached the diff relative to the code that I cloned
>             from Bioconductor-mirror yesterday (please ignore the
>         previous diff
>             file).
>
>             Basically three new features:
>
>               * As per previous email: display up to the first 6 lines
>         that are
>                 over 80 characters long
>               * *New*: display up to the first 6 lines that are not
>         indented by
>                 a multiple of 4 spaces
>               * *New*: display up to the first 6 lines that use TAB
>         instead of 4
>                 spaces for indentation
>
>             I also attach the output of the updated code
>             <https://github.com/kevinrue/BiocCheck
>         <https://github.com/kevinrue/BiocCheck>>, to illustrate the changes.
>
>             Notes:
>
>               * For demonstration purpose, I indented a handful of lines
>         from
>                 the checks.R file itself with TAB characters. I assume
>         that's
>                 OK, as some lines were already longer than 80 characters
>         and not
>                 indented by a multiple of 4 spaces.
>
>             All the best,
>             Kevin
>
>
>             On Wed, Nov 2, 2016 at 10:00 PM, Kevin RUE
>         <kevinrue67 at gmail.com <mailto:kevinrue67 at gmail.com>
>             <mailto:kevinrue67 at gmail.com <mailto:kevinrue67 at gmail.com>>>
>         wrote:
>
>                 Me again :)
>
>                 Please find attached the first patch to print the first
>         6 lines
>                 over 80 characters long. (I'll get to the tabulation
>         offenders
>                 next).
>
>                 Note that all the offending lines are stored in the
>         "df.length"
>                 data.frame. How about an option like
>         "fullReport=c(FALSE, TRUE)"
>                 that print *all* the offending lines?
>                 The data.frame also stores the content of the lines for the
>                 record, but does not print them. I think Kasper is right:
>                 filename and line should be enough to track down the line.
>
>                 All the best,
>                 Kevin
>
>
>
>                 On Wed, Nov 2, 2016 at 8:08 PM, Kevin RUE
>         <kevinrue67 at gmail.com <mailto:kevinrue67 at gmail.com>
>                 <mailto:kevinrue67 at gmail.com
>         <mailto:kevinrue67 at gmail.com>>> wrote:
>
>                     Thanks for the feedback!
>
>                     I also tend to prefer *all* the lines being reported
>         (or to
>                     be honest, that was really true when I had lots of
>         them; a
>                     problem that I largely mitigated by fixing all of
>         them once
>                     and subsequently paying more attention while
>         developing).
>
>                     Printing the content of the offending line somewhat
>         helps me
>                     spot the line faster (more so for tab issues). But I
>         must
>                     admit that showing the whole line is somewhat
>         "overkill". I
>                     just started thinking of a compromise being to only
>         show the
>                     first N characters of the line, with N being 80
>         minus the
>                     number of characters necessary to print the filename and
>                     line number.
>
>                     Thanks Martin for pointing out the lines in
>         BiocCheck. (Now
>                     I feel bad for not having checked sooner.. hehe!)
>                     I think the idea of BiocCheck showing the first 6
>         offenders
>                     in BiocCheck quite nice, as I rarely have more since
>         I use
>                     using the RStudio "Tools > Global Options > Code >
>         Display >
>                     Show Margin > Margin column: 80" feature.
>
>                     I'll give a go at both approaches (developing
>         BiocCheck and
>                     my own scripts)
>
>                     Cheers,
>                     Kevin
>
>
>                     On Wed, Nov 2, 2016 at 7:41 PM, Kasper Daniel Hansen
>                     <kasperdanielhansen at gmail.com
>         <mailto:kasperdanielhansen at gmail.com>
>                     <mailto:kasperdanielhansen at gmail.com
>         <mailto:kasperdanielhansen at gmail.com>>> wrote:
>
>                         I would prefer all line numbers reported, but on the
>                         other hand I am indifferent wrt. the content of the
>                         line, unless (say) TABs are marked up somehow.
>
>                         Kasper
>
>                         On Wed, Nov 2, 2016 at 3:17 PM, Martin Morgan
>                         <martin.morgan at roswellpark.org
>         <mailto:martin.morgan at roswellpark.org>
>                         <mailto:martin.morgan at roswellpark.org
>         <mailto:martin.morgan at roswellpark.org>>> wrote:
>
>                             On 11/02/2016 02:49 PM, Kevin RUE wrote:
>
>                                 Dear all,
>
>                                 Just thought I'd share a handful of
>         scripts that
>                                 I wrote to follow up on
>                                 certain NOTE messages thrown by R CMD
>         BiocCheck.
>
>
>         https://github.com/kevinrue/BiocCheckTools
>         <https://github.com/kevinrue/BiocCheckTools>
>
>         <https://github.com/kevinrue/BiocCheckTools
>         <https://github.com/kevinrue/BiocCheckTools>>
>
>                                 They're very simple, but I occasionally find
>                                 them quite convenient.
>                                 Apologies if something similar already
>         exists
>                                 somewhere :)
>
>
>                             Maybe consider creating a diff against the
>         source
>                             code that, e.g., reported the first 6
>         offenders? The
>                             relevant lines are near
>
>
>         https://github.com/Bioconductor-mirror/BiocCheck/blob/master/R/checks.R#L1081
>         <https://github.com/Bioconductor-mirror/BiocCheck/blob/master/R/checks.R#L1081>
>
>         <https://github.com/Bioconductor-mirror/BiocCheck/blob/master/R/checks.R#L1081
>         <https://github.com/Bioconductor-mirror/BiocCheck/blob/master/R/checks.R#L1081>>
>
>                             Martin
>
>
>                                 All the best,
>                                 Kevin
>
>                                         [[alternative HTML version deleted]]
>
>
>         _______________________________________________
>                                 Bioc-devel at r-project.org
>         <mailto:Bioc-devel at r-project.org>
>                                 <mailto:Bioc-devel at r-project.org
>         <mailto:Bioc-devel at r-project.org>> mailing list
>
>         https://stat.ethz.ch/mailman/listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>
>
>                             This email message may contain legally
>         privileged
>                             and/or...{{dropped:2}}
>
>
>                             _______________________________________________
>                             Bioc-devel at r-project.org
>         <mailto:Bioc-devel at r-project.org>
>                             <mailto:Bioc-devel at r-project.org
>         <mailto:Bioc-devel at r-project.org>> mailing list
>
>         https://stat.ethz.ch/mailman/listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>
>
>
>
>
>
>
>
>     This email message may contain legally privileged and/or
>     confidential information.  If you are not the intended recipient(s),
>     or the employee or agent responsible for the delivery of this
>     message to the intended recipient(s), you are hereby notified that
>     any disclosure, copying, distribution, or use of this email message
>     is prohibited.  If you have received this message in error, please
>     notify the sender immediately by e-mail and delete this email
>     message from your computer. Thank you.
>
>

This email message may contain legally privileged and/or...{{dropped:2}}