[Bioc-devel] A handful of check to follow up on R CMD BiocCheck

Thu Nov 3 13:14:38 CET 2016

Apologies for the additional spam, for two reasons:

   - The diff files that I've previously sent had the base and modified
   versions swapped. This new one fixes that.
   - This new diff file (always relative to the code I cloned from
   Bioconductor-mirror) also fixes a bug whereby the updated code would fail
   on packages that do not break any of the three guidelines.

Best,
Kevin

On Thu, Nov 3, 2016 at 11:49 AM, Kevin RUE <kevinrue67 at gmail.com> wrote:

> Hi all,
>
> Please find attached the diff relative to the code that I cloned from
> Bioconductor-mirror yesterday (please ignore the previous diff file).
>
> Basically three new features:
>
>    - As per previous email: display up to the first 6 lines that are over
>    80 characters long
>    - *New*: display up to the first 6 lines that are not indented by a
>    multiple of 4 spaces
>    - *New*: display up to the first 6 lines that use TAB instead of 4
>    spaces for indentation
>
> I also attach the output of the updated code
> <https://github.com/kevinrue/BiocCheck>, to illustrate the changes.
>
> Notes:
>
>    - For demonstration purpose, I indented a handful of lines from the
>    checks.R file itself with TAB characters. I assume that's OK, as some lines
>    were already longer than 80 characters and not indented by a multiple of 4
>    spaces.
>
> All the best,
> Kevin
>
> On Wed, Nov 2, 2016 at 10:00 PM, Kevin RUE <kevinrue67 at gmail.com> wrote:
>
>> Me again :)
>>
>> Please find attached the first patch to print the first 6 lines over 80
>> characters long. (I'll get to the tabulation offenders next).
>>
>> Note that all the offending lines are stored in the "df.length"
>> data.frame. How about an option like "fullReport=c(FALSE, TRUE)" that print
>> *all* the offending lines?
>> The data.frame also stores the content of the lines for the record, but
>> does not print them. I think Kasper is right: filename and line should be
>> enough to track down the line.
>>
>> All the best,
>> Kevin
>>
>>
>>
>> On Wed, Nov 2, 2016 at 8:08 PM, Kevin RUE <kevinrue67 at gmail.com> wrote:
>>
>>> Thanks for the feedback!
>>>
>>> I also tend to prefer *all* the lines being reported (or to be honest,
>>> that was really true when I had lots of them; a problem that I largely
>>> mitigated by fixing all of them once and subsequently paying more attention
>>> while developing).
>>>
>>> Printing the content of the offending line somewhat helps me spot the
>>> line faster (more so for tab issues). But I must admit that showing the
>>> whole line is somewhat "overkill". I just started thinking of a compromise
>>> being to only show the first N characters of the line, with N being 80
>>> minus the number of characters necessary to print the filename and line
>>> number.
>>>
>>> Thanks Martin for pointing out the lines in BiocCheck. (Now I feel bad
>>> for not having checked sooner.. hehe!)
>>> I think the idea of BiocCheck showing the first 6 offenders in BiocCheck
>>> quite nice, as I rarely have more since I use using the RStudio "Tools >
>>> Global Options > Code > Display > Show Margin > Margin column: 80" feature.
>>>
>>> I'll give a go at both approaches (developing BiocCheck and my own
>>> scripts)
>>>
>>> Cheers,
>>> Kevin
>>>
>>>
>>> On Wed, Nov 2, 2016 at 7:41 PM, Kasper Daniel Hansen <
>>> kasperdanielhansen at gmail.com> wrote:
>>>
>>>> I would prefer all line numbers reported, but on the other hand I am
>>>> indifferent wrt. the content of the line, unless (say) TABs are marked up
>>>> somehow.
>>>>
>>>> Kasper
>>>>
>>>> On Wed, Nov 2, 2016 at 3:17 PM, Martin Morgan <
>>>> martin.morgan at roswellpark.org> wrote:
>>>>
>>>>> On 11/02/2016 02:49 PM, Kevin RUE wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> Just thought I'd share a handful of scripts that I wrote to follow up
>>>>>> on
>>>>>> certain NOTE messages thrown by R CMD BiocCheck.
>>>>>>
>>>>>> https://github.com/kevinrue/BiocCheckTools
>>>>>>
>>>>>> They're very simple, but I occasionally find them quite convenient.
>>>>>> Apologies if something similar already exists somewhere :)
>>>>>>
>>>>>
>>>>> Maybe consider creating a diff against the source code that, e.g.,
>>>>> reported the first 6 offenders? The relevant lines are near
>>>>>
>>>>> https://github.com/Bioconductor-mirror/BiocCheck/blob/master
>>>>> /R/checks.R#L1081
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>>> All the best,
>>>>>> Kevin
>>>>>>
>>>>>>         [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>>
>>>>>
>>>>> This email message may contain legally privileged
>>>>> and/or...{{dropped:2}}
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------

diff --git a/R/checks.R b/R/checks.R
index 9b1f273..69c61ea 100644
--- a/R/checks.R
+++ b/R/checks.R
@@ -1057,12 +1057,15 @@ checkFormatting <- function(pkgdir)
     tablines <- 0L
     badindentlines <- 0L
     ok <- TRUE
-
+    
+    df.length <- data.frame(stringsAsFactors=FALSE)
+    df.indent <- data.frame(stringsAsFactors=FALSE)
+    df.tab <- data.frame(stringsAsFactors=FALSE)
     for (file in files)
     {
+        pkgname <- getPkgNameFromPkgDir(pkgdir)
         if (file.exists(file) && file.info(file)$size == 0)
         {
-            pkgname <- getPkgNameFromPkgDir(pkgdir)
             handleNote(sprintf("Add content to the empty file %s.",
                 mungeName(file, pkgname)))
         }
@@ -1072,14 +1075,22 @@ checkFormatting <- function(pkgdir)
             lines <- readLines(file, warn=FALSE)
             totallines <- totallines + length(lines)
             n <- nchar(lines, allowNA=TRUE)
-            n <- n[!is.na(n)]
+            n <- n[!is.na(n)]; lines <- lines[!is.na(n)]
 
             names(n) <- seq_along(1:length(n))
-            long <- n[n > 80]
+            long <- which(n > 80)
+            
             if (length(long))
             {
-                ## TODO/FIXME We could tell the user here which lines are long
-                ## in which files.
+                df.i <- 1
+                for (i in long){
+                    df.length[df.i,1] <- mungeName(file, pkgname) # filename
+                    df.length[df.i,2] <- names(n)[i] # line number
+                    df.length[df.i,3] <- lines[i] # content
+                    df.length[df.i,4] <- n[i] # length
+                    df.i <- df.i + 1
+                }
+                colnames(df.length) <- c("File", "Line", "Content", "Length")
                 longlines <- longlines + length(long)
             }
 
@@ -1087,42 +1098,88 @@ checkFormatting <- function(pkgdir)
             if (any(tabs))
             {
                 tablines <- tablines + length(which(tabs))
+                df.i <- 1
+                for (i in which(tabs)){
+                	df.tab[df.i,1] <- mungeName(file, pkgname) # filename
+                	df.tab[df.i,2] <- names(n)[i] # line number
+                	df.tab[df.i,3] <- lines[i] # content
+                	df.i <- df.i + 1
+                	colnames(df.tab) <- c("File", "Line", "Content")
+                }
             }
 
             res <- regexpr("^([ ]+)", lines)
             if (any(res > -1))
             {
                 match.length <- attr(res, "match.length")
-                indents <- match.length[match.length > -1]
-                badindentlinesinthisfile <- length(which(indents %% 4 != 0))
+                badindents <- which(
+                    match.length > -1 &
+                        match.length %% 4 != 0
+                )
+                badindentlinesinthisfile <- length(badindents)
                 badindentlines <- badindentlines + badindentlinesinthisfile
             }
+            if (badindentlinesinthisfile){
+                df.i <- 1
+                for (i in badindents){
+                    df.indent[df.i,1] <- mungeName(file, pkgname) # filename
+                    df.indent[df.i,2] <- names(n)[i] # line number
+                    df.indent[df.i,3] <- lines[i] # content
+                    df.indent[df.i,4] <- match.length[i] # indent width
+                    df.i <- df.i + 1
+                }
+                colnames(df.indent) <- c("File", "Line", "Content", "Indent")
+            }
 
         }
     }
+    
     if (longlines > 0)
     {
         ok <- FALSE
+        h.length <- head(df.length)
         handleNote(sprintf(
             "Consider shorter lines; %s lines (%i%%) are > 80 characters long.",
             longlines, as.integer((longlines/totallines) * 100)))
+        message(sprintf("    The first %i lines are:", nrow(h.length)))
+        for (i in 1:nrow(h.length))
+        {
+            row <- h.length[i,]
+            message(sprintf("      %s (line %s): %s characters",
+                            row$File, row$Line, row$Length))
+        }
     }
     if (tablines > 0)
     {
         ok <- FALSE
+        h.tab <- head(df.tab)
         handleNote(sprintf(
             "Consider 4 spaces instead of tabs; %s lines (%i%%) contain tabs.",
             tablines, as.integer((tablines/totallines) * (100/1) )))
+        message(sprintf("    The first %i lines are:", nrow(h.tab)))
+        for (i in 1:nrow(h.tab))
+        {
+            row <- h.tab[i,]
+            message(sprintf("      %s (line %s)",
+                            row$File, row$Line))
+        }
     }
     if (badindentlines > 0)
     {
         ok <- FALSE
+        h.tab <- head(df.indent)
         handleNote(sprintf(paste(
             "Consider indenting lines with a multiple of 4 spaces;",
             "%s lines (%i%%) are not."),
             badindentlines,
             as.integer((badindentlines/totallines) * 100)))
-
+        message(sprintf("    The first %i lines are:", nrow(h.tab)))
+        for (i in 1:nrow(h.tab))
+        {
+            row <- h.tab[i,]
+            message(sprintf("      %s (line %s): %s spaces",
+                            row$File, row$Line, row$Indent))
+        }
     }
 
     if (!ok)