[Bioc-devel] A handful of check to follow up on R CMD BiocCheck

Wed Nov 2 23:00:19 CET 2016

Me again :)

Please find attached the first patch to print the first 6 lines over 80
characters long. (I'll get to the tabulation offenders next).

Note that all the offending lines are stored in the "df.length" data.frame.
How about an option like "fullReport=c(FALSE, TRUE)" that print *all* the
offending lines?
The data.frame also stores the content of the lines for the record, but
does not print them. I think Kasper is right: filename and line should be
enough to track down the line.

All the best,
Kevin



On Wed, Nov 2, 2016 at 8:08 PM, Kevin RUE <kevinrue67 at gmail.com> wrote:

> Thanks for the feedback!
>
> I also tend to prefer *all* the lines being reported (or to be honest,
> that was really true when I had lots of them; a problem that I largely
> mitigated by fixing all of them once and subsequently paying more attention
> while developing).
>
> Printing the content of the offending line somewhat helps me spot the line
> faster (more so for tab issues). But I must admit that showing the whole
> line is somewhat "overkill". I just started thinking of a compromise being
> to only show the first N characters of the line, with N being 80 minus the
> number of characters necessary to print the filename and line number.
>
> Thanks Martin for pointing out the lines in BiocCheck. (Now I feel bad for
> not having checked sooner.. hehe!)
> I think the idea of BiocCheck showing the first 6 offenders in BiocCheck
> quite nice, as I rarely have more since I use using the RStudio "Tools >
> Global Options > Code > Display > Show Margin > Margin column: 80" feature.
>
> I'll give a go at both approaches (developing BiocCheck and my own scripts)
>
> Cheers,
> Kevin
>
>
> On Wed, Nov 2, 2016 at 7:41 PM, Kasper Daniel Hansen <
> kasperdanielhansen at gmail.com> wrote:
>
>> I would prefer all line numbers reported, but on the other hand I am
>> indifferent wrt. the content of the line, unless (say) TABs are marked up
>> somehow.
>>
>> Kasper
>>
>> On Wed, Nov 2, 2016 at 3:17 PM, Martin Morgan <
>> martin.morgan at roswellpark.org> wrote:
>>
>>> On 11/02/2016 02:49 PM, Kevin RUE wrote:
>>>
>>>> Dear all,
>>>>
>>>> Just thought I'd share a handful of scripts that I wrote to follow up on
>>>> certain NOTE messages thrown by R CMD BiocCheck.
>>>>
>>>> https://github.com/kevinrue/BiocCheckTools
>>>>
>>>> They're very simple, but I occasionally find them quite convenient.
>>>> Apologies if something similar already exists somewhere :)
>>>>
>>>
>>> Maybe consider creating a diff against the source code that, e.g.,
>>> reported the first 6 offenders? The relevant lines are near
>>>
>>> https://github.com/Bioconductor-mirror/BiocCheck/blob/master
>>> /R/checks.R#L1081
>>>
>>> Martin
>>>
>>>
>>>> All the best,
>>>> Kevin
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>
>>>
>>> This email message may contain legally privileged and/or...{{dropped:2}}
>>>
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>
-------------- next part --------------

diff --git a/R/checks.R b/R/checks.R
index f0b5844..9b1f273 100644
--- a/R/checks.R
+++ b/R/checks.R
@@ -1057,14 +1057,12 @@ checkFormatting <- function(pkgdir)
     tablines <- 0L
     badindentlines <- 0L
     ok <- TRUE
-    
-    df.length <- data.frame(stringsAsFactors=FALSE)
-    df.i <- 1
+
     for (file in files)
     {
-        pkgname <- getPkgNameFromPkgDir(pkgdir)
         if (file.exists(file) && file.info(file)$size == 0)
         {
+            pkgname <- getPkgNameFromPkgDir(pkgdir)
             handleNote(sprintf("Add content to the empty file %s.",
                 mungeName(file, pkgname)))
         }
@@ -1074,23 +1072,14 @@ checkFormatting <- function(pkgdir)
             lines <- readLines(file, warn=FALSE)
             totallines <- totallines + length(lines)
             n <- nchar(lines, allowNA=TRUE)
-            n <- n[!is.na(n)]; lines <- lines[!is.na(n)]
+            n <- n[!is.na(n)]
 
             names(n) <- seq_along(1:length(n))
-            long <- which(n > 80)
-            
+            long <- n[n > 80]
             if (length(long))
             {
                 ## TODO/FIXME We could tell the user here which lines are long
                 ## in which files.
-                for (i in long){
-                    df.length[df.i,1] <- mungeName(file, pkgname) # filename
-                    df.length[df.i,2] <- names(n)[i] # line number
-                    df.length[df.i,3] <- lines[i] # content
-                    df.length[df.i,4] <- n[i] # length
-                    df.i <- df.i + 1
-                }
-                
                 longlines <- longlines + length(long)
             }
 
@@ -1111,22 +1100,12 @@ checkFormatting <- function(pkgdir)
 
         }
     }
-    colnames(df.length) <- c("File", "Line", "Content", "Length")
-    
     if (longlines > 0)
     {
         ok <- FALSE
-        h.length <- head(df.length)
         handleNote(sprintf(
             "Consider shorter lines; %s lines (%i%%) are > 80 characters long.",
             longlines, as.integer((longlines/totallines) * 100)))
-        message(sprintf("    The first %i lines are:", nrow(h.length)))
-        for (i in 1:nrow(h.length))
-        {
-            row <- h.length[i,]
-            message(sprintf("      %s (line %s): %s characters",
-                            row$File, row$Line, row$Length))
-        }
     }
     if (tablines > 0)
     {