[Bioc-devel] A handful of check to follow up on R CMD BiocCheck
Kevin RUE
kevinrue67 at gmail.com
Thu Nov 3 13:14:38 CET 2016
Apologies for the additional spam, for two reasons:
- The diff files that I've previously sent had the base and modified
versions swapped. This new one fixes that.
- This new diff file (always relative to the code I cloned from
Bioconductor-mirror) also fixes a bug whereby the updated code would fail
on packages that do not break any of the three guidelines.
Best,
Kevin
On Thu, Nov 3, 2016 at 11:49 AM, Kevin RUE <kevinrue67 at gmail.com> wrote:
> Hi all,
>
> Please find attached the diff relative to the code that I cloned from
> Bioconductor-mirror yesterday (please ignore the previous diff file).
>
> Basically three new features:
>
> - As per previous email: display up to the first 6 lines that are over
> 80 characters long
> - *New*: display up to the first 6 lines that are not indented by a
> multiple of 4 spaces
> - *New*: display up to the first 6 lines that use TAB instead of 4
> spaces for indentation
>
> I also attach the output of the updated code
> <https://github.com/kevinrue/BiocCheck>, to illustrate the changes.
>
> Notes:
>
> - For demonstration purpose, I indented a handful of lines from the
> checks.R file itself with TAB characters. I assume that's OK, as some lines
> were already longer than 80 characters and not indented by a multiple of 4
> spaces.
>
> All the best,
> Kevin
>
> On Wed, Nov 2, 2016 at 10:00 PM, Kevin RUE <kevinrue67 at gmail.com> wrote:
>
>> Me again :)
>>
>> Please find attached the first patch to print the first 6 lines over 80
>> characters long. (I'll get to the tabulation offenders next).
>>
>> Note that all the offending lines are stored in the "df.length"
>> data.frame. How about an option like "fullReport=c(FALSE, TRUE)" that print
>> *all* the offending lines?
>> The data.frame also stores the content of the lines for the record, but
>> does not print them. I think Kasper is right: filename and line should be
>> enough to track down the line.
>>
>> All the best,
>> Kevin
>>
>>
>>
>> On Wed, Nov 2, 2016 at 8:08 PM, Kevin RUE <kevinrue67 at gmail.com> wrote:
>>
>>> Thanks for the feedback!
>>>
>>> I also tend to prefer *all* the lines being reported (or to be honest,
>>> that was really true when I had lots of them; a problem that I largely
>>> mitigated by fixing all of them once and subsequently paying more attention
>>> while developing).
>>>
>>> Printing the content of the offending line somewhat helps me spot the
>>> line faster (more so for tab issues). But I must admit that showing the
>>> whole line is somewhat "overkill". I just started thinking of a compromise
>>> being to only show the first N characters of the line, with N being 80
>>> minus the number of characters necessary to print the filename and line
>>> number.
>>>
>>> Thanks Martin for pointing out the lines in BiocCheck. (Now I feel bad
>>> for not having checked sooner.. hehe!)
>>> I think the idea of BiocCheck showing the first 6 offenders in BiocCheck
>>> quite nice, as I rarely have more since I use using the RStudio "Tools >
>>> Global Options > Code > Display > Show Margin > Margin column: 80" feature.
>>>
>>> I'll give a go at both approaches (developing BiocCheck and my own
>>> scripts)
>>>
>>> Cheers,
>>> Kevin
>>>
>>>
>>> On Wed, Nov 2, 2016 at 7:41 PM, Kasper Daniel Hansen <
>>> kasperdanielhansen at gmail.com> wrote:
>>>
>>>> I would prefer all line numbers reported, but on the other hand I am
>>>> indifferent wrt. the content of the line, unless (say) TABs are marked up
>>>> somehow.
>>>>
>>>> Kasper
>>>>
>>>> On Wed, Nov 2, 2016 at 3:17 PM, Martin Morgan <
>>>> martin.morgan at roswellpark.org> wrote:
>>>>
>>>>> On 11/02/2016 02:49 PM, Kevin RUE wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> Just thought I'd share a handful of scripts that I wrote to follow up
>>>>>> on
>>>>>> certain NOTE messages thrown by R CMD BiocCheck.
>>>>>>
>>>>>> https://github.com/kevinrue/BiocCheckTools
>>>>>>
>>>>>> They're very simple, but I occasionally find them quite convenient.
>>>>>> Apologies if something similar already exists somewhere :)
>>>>>>
>>>>>
>>>>> Maybe consider creating a diff against the source code that, e.g.,
>>>>> reported the first 6 offenders? The relevant lines are near
>>>>>
>>>>> https://github.com/Bioconductor-mirror/BiocCheck/blob/master
>>>>> /R/checks.R#L1081
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>>> All the best,
>>>>>> Kevin
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>>
>>>>>
>>>>> This email message may contain legally privileged
>>>>> and/or...{{dropped:2}}
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
diff --git a/R/checks.R b/R/checks.R
index 9b1f273..69c61ea 100644
--- a/R/checks.R
+++ b/R/checks.R
@@ -1057,12 +1057,15 @@ checkFormatting <- function(pkgdir)
tablines <- 0L
badindentlines <- 0L
ok <- TRUE
-
+
+ df.length <- data.frame(stringsAsFactors=FALSE)
+ df.indent <- data.frame(stringsAsFactors=FALSE)
+ df.tab <- data.frame(stringsAsFactors=FALSE)
for (file in files)
{
+ pkgname <- getPkgNameFromPkgDir(pkgdir)
if (file.exists(file) && file.info(file)$size == 0)
{
- pkgname <- getPkgNameFromPkgDir(pkgdir)
handleNote(sprintf("Add content to the empty file %s.",
mungeName(file, pkgname)))
}
@@ -1072,14 +1075,22 @@ checkFormatting <- function(pkgdir)
lines <- readLines(file, warn=FALSE)
totallines <- totallines + length(lines)
n <- nchar(lines, allowNA=TRUE)
- n <- n[!is.na(n)]
+ n <- n[!is.na(n)]; lines <- lines[!is.na(n)]
names(n) <- seq_along(1:length(n))
- long <- n[n > 80]
+ long <- which(n > 80)
+
if (length(long))
{
- ## TODO/FIXME We could tell the user here which lines are long
- ## in which files.
+ df.i <- 1
+ for (i in long){
+ df.length[df.i,1] <- mungeName(file, pkgname) # filename
+ df.length[df.i,2] <- names(n)[i] # line number
+ df.length[df.i,3] <- lines[i] # content
+ df.length[df.i,4] <- n[i] # length
+ df.i <- df.i + 1
+ }
+ colnames(df.length) <- c("File", "Line", "Content", "Length")
longlines <- longlines + length(long)
}
@@ -1087,42 +1098,88 @@ checkFormatting <- function(pkgdir)
if (any(tabs))
{
tablines <- tablines + length(which(tabs))
+ df.i <- 1
+ for (i in which(tabs)){
+ df.tab[df.i,1] <- mungeName(file, pkgname) # filename
+ df.tab[df.i,2] <- names(n)[i] # line number
+ df.tab[df.i,3] <- lines[i] # content
+ df.i <- df.i + 1
+ colnames(df.tab) <- c("File", "Line", "Content")
+ }
}
res <- regexpr("^([ ]+)", lines)
if (any(res > -1))
{
match.length <- attr(res, "match.length")
- indents <- match.length[match.length > -1]
- badindentlinesinthisfile <- length(which(indents %% 4 != 0))
+ badindents <- which(
+ match.length > -1 &
+ match.length %% 4 != 0
+ )
+ badindentlinesinthisfile <- length(badindents)
badindentlines <- badindentlines + badindentlinesinthisfile
}
+ if (badindentlinesinthisfile){
+ df.i <- 1
+ for (i in badindents){
+ df.indent[df.i,1] <- mungeName(file, pkgname) # filename
+ df.indent[df.i,2] <- names(n)[i] # line number
+ df.indent[df.i,3] <- lines[i] # content
+ df.indent[df.i,4] <- match.length[i] # indent width
+ df.i <- df.i + 1
+ }
+ colnames(df.indent) <- c("File", "Line", "Content", "Indent")
+ }
}
}
+
if (longlines > 0)
{
ok <- FALSE
+ h.length <- head(df.length)
handleNote(sprintf(
"Consider shorter lines; %s lines (%i%%) are > 80 characters long.",
longlines, as.integer((longlines/totallines) * 100)))
+ message(sprintf(" The first %i lines are:", nrow(h.length)))
+ for (i in 1:nrow(h.length))
+ {
+ row <- h.length[i,]
+ message(sprintf(" %s (line %s): %s characters",
+ row$File, row$Line, row$Length))
+ }
}
if (tablines > 0)
{
ok <- FALSE
+ h.tab <- head(df.tab)
handleNote(sprintf(
"Consider 4 spaces instead of tabs; %s lines (%i%%) contain tabs.",
tablines, as.integer((tablines/totallines) * (100/1) )))
+ message(sprintf(" The first %i lines are:", nrow(h.tab)))
+ for (i in 1:nrow(h.tab))
+ {
+ row <- h.tab[i,]
+ message(sprintf(" %s (line %s)",
+ row$File, row$Line))
+ }
}
if (badindentlines > 0)
{
ok <- FALSE
+ h.tab <- head(df.indent)
handleNote(sprintf(paste(
"Consider indenting lines with a multiple of 4 spaces;",
"%s lines (%i%%) are not."),
badindentlines,
as.integer((badindentlines/totallines) * 100)))
-
+ message(sprintf(" The first %i lines are:", nrow(h.tab)))
+ for (i in 1:nrow(h.tab))
+ {
+ row <- h.tab[i,]
+ message(sprintf(" %s (line %s): %s spaces",
+ row$File, row$Line, row$Indent))
+ }
}
if (!ok)
More information about the Bioc-devel
mailing list