[R-pkg-devel] duplicate function during build

Enrico Schumann es at enricoschumann.net
Wed Jul 27 22:42:51 CEST 2016


On Sat, 23 Jul 2016, ProfJCNash <profjcnash at gmail.com> writes:

> Thanks Sven. That indeed works. And if anyone has ideas how it could be
> put into R so Windows users could benefit, I'm sure it would be useful
> in checks of packages.

You could use R functionality to rewrite the shell
commands. Perhaps along those lines:

--8<---------------cut here---------------start------------->8---
fun_names <- function(dir,
                      duplicates_only = TRUE,
                      file_pattern = "[.][rR]$",
                      fun_pattern = " *([^\\s]+) *<- *function.*") {

    files <- dir(dir, pattern = file_pattern, full.names = TRUE)
    ans <- data.frame(fun = character(0),
                      file = character(0))
    for (f in files) {
        txt <- readLines(f)
        fun.lines <- grepl(fun_pattern, txt)
        
        if (any(fun.lines)) {
            ans <- rbind(ans,
                         data.frame(fun = gsub(fun_pattern, "\\1",
                                               txt[fun.lines],
                                               perl = TRUE),
                                    file = f,
                                    line = which(fun.lines),
                                    stringsAsFactors = FALSE))
        }
    }

    ans <- ans[order(ans[["fun"]]), ]

    if (duplicates_only) {
        d <- duplicated(ans[["fun"]])
        d0 <- match(unique(ans[["fun"]][d]), ans[["fun"]])
        ans <- ans[sort(c(d0, which(d))),]
    }

    ans
}
--8<---------------cut here---------------end--------------->8---

One would call then function on a directory.

For instance,

  fun_names("~/Packages/NMOF/R")

gives me output

         fun                                    file line
10  cfHeston       /home/es/Packages/NMOF/R/callCF.R   41
18  cfHeston /home/es/Packages/NMOF/R/callHestoncf.R   29

## [...] 

But it will be tricky to catch only such re-definitions
of functions that have been left in the files by
mistake. For instance, I often define short helper
functions within other functions, and such helper
functions might then get flagged, too.

Kind regards
        Enrico



> In other investigations of this, I realized that install.R has to
> prepare the .rdb and .rdx files and at that stage duplication might be
> detected. If install.R puts both versions of a duplicated name into
> these files, then the lazy load of library() or require() could be a
> place where detection would be useful, though only one of the names gets
> actually made available for use. However, my expertise with this
> internal aspect of R is rather weak.
>
> Cheers, JN
>
> On 16-07-23 12:04 PM, Sven E. Templer wrote:
>> Despite it might help, learning/using git is not tackling this specific problem, I suggest code that does:
>> 
>> sed -e 's/^[\ \t]*//' -e 's/#.*//' R/* | awk '/function/{print $1}' | sort | uniq -d
>> 
>> or
>> 
>> https://gist.github.com/setempler/7fcf2a3a737ce1293e0623d2bb8e08ed
>> (any comments welcome)
>> 
>> If one knows coding R, it might be more productive developing a tiny tool for that, instead of learning a new (and complex) one (as git).
>> 
>> Nevertheless, git is great!
>> 
>> Best wishes,
>> 
>> Sven
>> 
>> ---
>> 
>> web:     www.templer.se
>> twitter: @setempler
>>> On 23 Jul 2016, at 16:17, Hadley Wickham <h.wickham at gmail.com> wrote:
>>>
>>> I think this sort of meta problem is best solved with svn/git because you
>>> can easily see if the changes you think you made align with the changes you
>>> actually made. Learning svn or git is a lot of work, but the payoff is
>>> worth it.
>>>
>>> Hadley
>>>
>>> On Friday, July 22, 2016, ProfJCNash <profjcnash at gmail.com> wrote:
>>>
>>>> In trying to rationalize some files in a package I'm working on, I
>>>> copied a function from one file to another, but forgot to change the
>>>> name of one of them. It turns out the name of the file containing the
>>>> "old" function was later in collation sequence than the one I was
>>>> planning to be the "new" one. To debug some issues, I put some print()
>>>> and cat() statements in the "new" file, but after building the package,
>>>> they weren't there. Turns out the "old" function got installed, as might
>>>> be expected if files processed in order. Debugging this took about 2
>>>> hours of slightly weird effort with 2 machines and 3 OS distributions
>>>> before I realized the problem. It's fairly obvious that I should expect
>>>> issues in this case, but not so clear how to detect the source of the
>>>> problem.
>>>>
>>>> Question: Has anyone created a script to catch such duplicate functions
>>>> from different files during build? I think a warning message that there
>>>> are duplicate functions could save some time and effort. Maybe it's
>>>> already there, but I saw no obvious message. In this case, I'm only
>>>> working in R.
>>>>
>>>> I've found build.R in the R tarball, which is where I suspect such a
>>>> check should go, and I'm willing to prepare a patch when I figure out
>>>> how this should be done. However, it seems worth asking if anyone has
>>>> needed to do this before. I've already done some searching, but the
>>>> results seem to pick up quite different posts than I need.
>>>>
>>>> Cheers, JN
>>>>

-- 
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net



More information about the R-package-devel mailing list