[Bioc-devel] A handful of check to follow up on R CMD BiocCheck

Kevin RUE kevinrue67 at gmail.com
Thu Nov 3 12:49:46 CET 2016


Hi all,

Please find attached the diff relative to the code that I cloned from
Bioconductor-mirror yesterday (please ignore the previous diff file).

Basically three new features:

   - As per previous email: display up to the first 6 lines that are over
   80 characters long
   - *New*: display up to the first 6 lines that are not indented by a
   multiple of 4 spaces
   - *New*: display up to the first 6 lines that use TAB instead of 4
   spaces for indentation

I also attach the output of the updated code
<https://github.com/kevinrue/BiocCheck>, to illustrate the changes.

Notes:

   - For demonstration purpose, I indented a handful of lines from the
   checks.R file itself with TAB characters. I assume that's OK, as some lines
   were already longer than 80 characters and not indented by a multiple of 4
   spaces.

All the best,
Kevin

On Wed, Nov 2, 2016 at 10:00 PM, Kevin RUE <kevinrue67 at gmail.com> wrote:

> Me again :)
>
> Please find attached the first patch to print the first 6 lines over 80
> characters long. (I'll get to the tabulation offenders next).
>
> Note that all the offending lines are stored in the "df.length"
> data.frame. How about an option like "fullReport=c(FALSE, TRUE)" that print
> *all* the offending lines?
> The data.frame also stores the content of the lines for the record, but
> does not print them. I think Kasper is right: filename and line should be
> enough to track down the line.
>
> All the best,
> Kevin
>
>
>
> On Wed, Nov 2, 2016 at 8:08 PM, Kevin RUE <kevinrue67 at gmail.com> wrote:
>
>> Thanks for the feedback!
>>
>> I also tend to prefer *all* the lines being reported (or to be honest,
>> that was really true when I had lots of them; a problem that I largely
>> mitigated by fixing all of them once and subsequently paying more attention
>> while developing).
>>
>> Printing the content of the offending line somewhat helps me spot the
>> line faster (more so for tab issues). But I must admit that showing the
>> whole line is somewhat "overkill". I just started thinking of a compromise
>> being to only show the first N characters of the line, with N being 80
>> minus the number of characters necessary to print the filename and line
>> number.
>>
>> Thanks Martin for pointing out the lines in BiocCheck. (Now I feel bad
>> for not having checked sooner.. hehe!)
>> I think the idea of BiocCheck showing the first 6 offenders in BiocCheck
>> quite nice, as I rarely have more since I use using the RStudio "Tools >
>> Global Options > Code > Display > Show Margin > Margin column: 80" feature.
>>
>> I'll give a go at both approaches (developing BiocCheck and my own
>> scripts)
>>
>> Cheers,
>> Kevin
>>
>>
>> On Wed, Nov 2, 2016 at 7:41 PM, Kasper Daniel Hansen <
>> kasperdanielhansen at gmail.com> wrote:
>>
>>> I would prefer all line numbers reported, but on the other hand I am
>>> indifferent wrt. the content of the line, unless (say) TABs are marked up
>>> somehow.
>>>
>>> Kasper
>>>
>>> On Wed, Nov 2, 2016 at 3:17 PM, Martin Morgan <
>>> martin.morgan at roswellpark.org> wrote:
>>>
>>>> On 11/02/2016 02:49 PM, Kevin RUE wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> Just thought I'd share a handful of scripts that I wrote to follow up
>>>>> on
>>>>> certain NOTE messages thrown by R CMD BiocCheck.
>>>>>
>>>>> https://github.com/kevinrue/BiocCheckTools
>>>>>
>>>>> They're very simple, but I occasionally find them quite convenient.
>>>>> Apologies if something similar already exists somewhere :)
>>>>>
>>>>
>>>> Maybe consider creating a diff against the source code that, e.g.,
>>>> reported the first 6 offenders? The relevant lines are near
>>>>
>>>> https://github.com/Bioconductor-mirror/BiocCheck/blob/master
>>>> /R/checks.R#L1081
>>>>
>>>> Martin
>>>>
>>>>
>>>>> All the best,
>>>>> Kevin
>>>>>
>>>>>         [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>>
>>>>
>>>> This email message may contain legally privileged and/or...{{dropped:2}}
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>
>>>
>>
>
-------------- next part --------------
diff --git a/R/checks.R b/R/checks.R
index 02e3841..9b1f273 100644
--- a/R/checks.R
+++ b/R/checks.R
@@ -1057,15 +1057,12 @@ checkFormatting <- function(pkgdir)
     tablines <- 0L
     badindentlines <- 0L
     ok <- TRUE
-    
-    df.length <- data.frame(stringsAsFactors=FALSE)
-    df.indent <- data.frame(stringsAsFactors=FALSE)
-    df.tab <- data.frame(stringsAsFactors=FALSE)
+
     for (file in files)
     {
-        pkgname <- getPkgNameFromPkgDir(pkgdir)
         if (file.exists(file) && file.info(file)$size == 0)
         {
+            pkgname <- getPkgNameFromPkgDir(pkgdir)
             handleNote(sprintf("Add content to the empty file %s.",
                 mungeName(file, pkgname)))
         }
@@ -1075,22 +1072,14 @@ checkFormatting <- function(pkgdir)
             lines <- readLines(file, warn=FALSE)
             totallines <- totallines + length(lines)
             n <- nchar(lines, allowNA=TRUE)
-            n <- n[!is.na(n)]; lines <- lines[!is.na(n)]
+            n <- n[!is.na(n)]
 
             names(n) <- seq_along(1:length(n))
-            long <- which(n > 80)
-            
+            long <- n[n > 80]
             if (length(long))
             {
-                df.i <- 1
-                for (i in long){
-                    df.length[df.i,1] <- mungeName(file, pkgname) # filename
-                    df.length[df.i,2] <- names(n)[i] # line number
-                    df.length[df.i,3] <- lines[i] # content
-                    df.length[df.i,4] <- n[i] # length
-                    df.i <- df.i + 1
-                }
-                
+                ## TODO/FIXME We could tell the user here which lines are long
+                ## in which files.
                 longlines <- longlines + length(long)
             }
 
@@ -1098,89 +1087,42 @@ checkFormatting <- function(pkgdir)
             if (any(tabs))
             {
                 tablines <- tablines + length(which(tabs))
-                df.i <- 1
-                for (i in which(tabs)){
-                	df.tab[df.i,1] <- mungeName(file, pkgname) # filename
-                	df.tab[df.i,2] <- names(n)[i] # line number
-                	df.tab[df.i,3] <- lines[i] # content
-                	df.i <- df.i + 1
-                }
             }
 
             res <- regexpr("^([ ]+)", lines)
             if (any(res > -1))
             {
                 match.length <- attr(res, "match.length")
-                badindents <- which(
-                    match.length > -1 &
-                        match.length %% 4 != 0
-                )
-                badindentlinesinthisfile <- length(badindents)
+                indents <- match.length[match.length > -1]
+                badindentlinesinthisfile <- length(which(indents %% 4 != 0))
                 badindentlines <- badindentlines + badindentlinesinthisfile
             }
-            if (badindentlinesinthisfile){
-                df.i <- 1
-                for (i in badindents){
-                    df.indent[df.i,1] <- mungeName(file, pkgname) # filename
-                    df.indent[df.i,2] <- names(n)[i] # line number
-                    df.indent[df.i,3] <- lines[i] # content
-                    df.indent[df.i,4] <- match.length[i] # indent width
-                    df.i <- df.i + 1
-                }
-            }
 
         }
     }
-    colnames(df.length) <- c("File", "Line", "Content", "Length")
-    colnames(df.tab) <- c("File", "Line", "Content")
-    colnames(df.indent) <- c("File", "Line", "Content", "Indent")
-    
     if (longlines > 0)
     {
         ok <- FALSE
-        h.length <- head(df.length)
         handleNote(sprintf(
             "Consider shorter lines; %s lines (%i%%) are > 80 characters long.",
             longlines, as.integer((longlines/totallines) * 100)))
-        message(sprintf("    The first %i lines are:", nrow(h.length)))
-        for (i in 1:nrow(h.length))
-        {
-            row <- h.length[i,]
-            message(sprintf("      %s (line %s): %s characters",
-                            row$File, row$Line, row$Length))
-        }
     }
     if (tablines > 0)
     {
         ok <- FALSE
-        h.tab <- head(df.tab)
         handleNote(sprintf(
             "Consider 4 spaces instead of tabs; %s lines (%i%%) contain tabs.",
             tablines, as.integer((tablines/totallines) * (100/1) )))
-        message(sprintf("    The first %i lines are:", nrow(h.tab)))
-        for (i in 1:nrow(h.tab))
-        {
-            row <- h.tab[i,]
-            message(sprintf("      %s (line %s)",
-                            row$File, row$Line))
-        }
     }
     if (badindentlines > 0)
     {
         ok <- FALSE
-        h.tab <- head(df.indent)
         handleNote(sprintf(paste(
             "Consider indenting lines with a multiple of 4 spaces;",
             "%s lines (%i%%) are not."),
             badindentlines,
             as.integer((badindentlines/totallines) * 100)))
-        message(sprintf("    The first %i lines are:", nrow(h.tab)))
-        for (i in 1:nrow(h.tab))
-        {
-            row <- h.tab[i,]
-            message(sprintf("      %s (line %s): %s spaces",
-                            row$File, row$Line, row$Indent))
-        }
+
     }
 
     if (!ok)
-------------- next part --------------
> BiocCheck("BiocCheck")
* This is BiocCheck, version 1.11.0.
* BiocCheck is a work in progress. Output and severity of issues may change.
* Installing package...
* Checking vignette directory...
* This is a software package, checking vignette directories...
      # of chunks: 3, # of eval=FALSE: 2 (66%)
    * WARNING: Evaluate more vignette chunks.
* Checking version number...
* Checking version number validity...
* Checking R Version dependency...
    * WARNING: Update R version dependency from 3.3.0 to 3.4.
* Checking biocViews...
* Checking that biocViews are present...
* Checking for non-trivial biocViews...
* Checking that biocViews come from the same category...
* Checking biocViews validity...
* Checking for recommended biocViews...
* Checking build system compatibility...
* Checking for blank lines in DESCRIPTION...
* Checking for whitespace in DESCRIPTION field names...
* Checking that Package field matches dir/tarball name...
* Checking for Version field...
* Checking for valid maintainer...
* Checking unit tests...
* Checking native routine registration...
* Checking for deprecated package usage...
* Checking parsed R code in R directory, examples, vignettes...
* Checking for direct slot access...
* Checking for T...
* Checking for F...
* Checking for browser()...
* Checking for <<-...
      Found <<- in R/checks.R (line 521, column 14)
    * NOTE: Avoid '<<-'' if possible (found in 1 files)
* Checking for library/require of BiocCheck...
* Checking DESCRIPTION/NAMESPACE consistency...
    * WARNING: Import utils in DESCRIPTION as well as NAMESPACE.
* Checking function lengths.......
  The longest function is 218 lines long
  The longest 5 functions are:
      BiocCheck() (R/BiocCheck.R, line 45): 218 lines
      checkFormatting() (R/checks.R, line 1044): 147 lines
      checkForBadDepends() (R/checks.R, line 570): 106 lines
      checkBiocViews() (R/checks.R, line 222): 93 lines
      checkBBScompatibility() (R/checks.R, line 316): 93 lines
* Checking man pages...
* Checking exported objects have runnable examples...
* Checking package NEWS...
    * NOTE: Consider adding a NEWS file, so your package news will be included in Bioconductor release
      announcements.
* Checking formatting of DESCRIPTION, NAMESPACE, man pages, R source, and vignette source...
    * NOTE: Consider shorter lines; 26 lines (1%) are > 80 characters long.
    The first 6 lines are:
      cCheck/R/util.R (line 127): 87 characters
      cCheck/R/util.R (line 218): 82 characters
      cCheck/R/checks.R (line 435): 82 characters
      cCheck/R/checks.R (line 459): 110 characters
      cCheck/R/checks.R (line 472): 81 characters
      cCheck/R/checks.R (line 644): 87 characters
    * NOTE: Consider 4 spaces instead of tabs; 4 lines (0%) contain tabs.
    The first 4 lines are:
      cCheck/R/checks.R (line 1103)
      cCheck/R/checks.R (line 1104)
      cCheck/R/checks.R (line 1105)
      cCheck/R/checks.R (line 1106)
    * NOTE: Consider indenting lines with a multiple of 4 spaces; 99 lines (3%) are not.
    The first 6 lines are:
      cCheck/R/util.R (line 288): 7 spaces
      cCheck/R/checks.R (line 1251): 13 spaces
      cCheck/R/checks.R (line 1252): 13 spaces
      cCheck/R/checks.R (line 1253): 13 spaces
      cCheck/vignettes/BiocCheck.Rmd (line 22): 2 spaces
      cCheck/vignettes/BiocCheck.Rmd (line 24): 2 spaces
  See http://bioconductor.org/developers/how-to/coding-style/
* Checking for canned comments in man pages...
* Checking if package already exists in CRAN...
* Checking for support site registration...
* Maintainer email is ok.


Summary:
ERROR count: 0
WARNING count: 0
NOTE count: 0
For detailed information about these checks, see the BiocCheck vignette, available at
http://bioconductor.org/packages/3.5/bioc/vignettes/BiocCheck/inst/doc/BiocCheck.html#interpreting-bioccheck-output


More information about the Bioc-devel mailing list