[R-pkg-devel] duplicate function during build
Enrico Schumann
es at enricoschumann.net
Wed Jul 27 22:42:51 CEST 2016
On Sat, 23 Jul 2016, ProfJCNash <profjcnash at gmail.com> writes:
> Thanks Sven. That indeed works. And if anyone has ideas how it could be
> put into R so Windows users could benefit, I'm sure it would be useful
> in checks of packages.
You could use R functionality to rewrite the shell
commands. Perhaps along those lines:
--8<---------------cut here---------------start------------->8---
fun_names <- function(dir,
duplicates_only = TRUE,
file_pattern = "[.][rR]$",
fun_pattern = " *([^\\s]+) *<- *function.*") {
files <- dir(dir, pattern = file_pattern, full.names = TRUE)
ans <- data.frame(fun = character(0),
file = character(0))
for (f in files) {
txt <- readLines(f)
fun.lines <- grepl(fun_pattern, txt)
if (any(fun.lines)) {
ans <- rbind(ans,
data.frame(fun = gsub(fun_pattern, "\\1",
txt[fun.lines],
perl = TRUE),
file = f,
line = which(fun.lines),
stringsAsFactors = FALSE))
}
}
ans <- ans[order(ans[["fun"]]), ]
if (duplicates_only) {
d <- duplicated(ans[["fun"]])
d0 <- match(unique(ans[["fun"]][d]), ans[["fun"]])
ans <- ans[sort(c(d0, which(d))),]
}
ans
}
--8<---------------cut here---------------end--------------->8---
One would call then function on a directory.
For instance,
fun_names("~/Packages/NMOF/R")
gives me output
fun file line
10 cfHeston /home/es/Packages/NMOF/R/callCF.R 41
18 cfHeston /home/es/Packages/NMOF/R/callHestoncf.R 29
## [...]
But it will be tricky to catch only such re-definitions
of functions that have been left in the files by
mistake. For instance, I often define short helper
functions within other functions, and such helper
functions might then get flagged, too.
Kind regards
Enrico
> In other investigations of this, I realized that install.R has to
> prepare the .rdb and .rdx files and at that stage duplication might be
> detected. If install.R puts both versions of a duplicated name into
> these files, then the lazy load of library() or require() could be a
> place where detection would be useful, though only one of the names gets
> actually made available for use. However, my expertise with this
> internal aspect of R is rather weak.
>
> Cheers, JN
>
> On 16-07-23 12:04 PM, Sven E. Templer wrote:
>> Despite it might help, learning/using git is not tackling this specific problem, I suggest code that does:
>>
>> sed -e 's/^[\ \t]*//' -e 's/#.*//' R/* | awk '/function/{print $1}' | sort | uniq -d
>>
>> or
>>
>> https://gist.github.com/setempler/7fcf2a3a737ce1293e0623d2bb8e08ed
>> (any comments welcome)
>>
>> If one knows coding R, it might be more productive developing a tiny tool for that, instead of learning a new (and complex) one (as git).
>>
>> Nevertheless, git is great!
>>
>> Best wishes,
>>
>> Sven
>>
>> ---
>>
>> web: www.templer.se
>> twitter: @setempler
>>> On 23 Jul 2016, at 16:17, Hadley Wickham <h.wickham at gmail.com> wrote:
>>>
>>> I think this sort of meta problem is best solved with svn/git because you
>>> can easily see if the changes you think you made align with the changes you
>>> actually made. Learning svn or git is a lot of work, but the payoff is
>>> worth it.
>>>
>>> Hadley
>>>
>>> On Friday, July 22, 2016, ProfJCNash <profjcnash at gmail.com> wrote:
>>>
>>>> In trying to rationalize some files in a package I'm working on, I
>>>> copied a function from one file to another, but forgot to change the
>>>> name of one of them. It turns out the name of the file containing the
>>>> "old" function was later in collation sequence than the one I was
>>>> planning to be the "new" one. To debug some issues, I put some print()
>>>> and cat() statements in the "new" file, but after building the package,
>>>> they weren't there. Turns out the "old" function got installed, as might
>>>> be expected if files processed in order. Debugging this took about 2
>>>> hours of slightly weird effort with 2 machines and 3 OS distributions
>>>> before I realized the problem. It's fairly obvious that I should expect
>>>> issues in this case, but not so clear how to detect the source of the
>>>> problem.
>>>>
>>>> Question: Has anyone created a script to catch such duplicate functions
>>>> from different files during build? I think a warning message that there
>>>> are duplicate functions could save some time and effort. Maybe it's
>>>> already there, but I saw no obvious message. In this case, I'm only
>>>> working in R.
>>>>
>>>> I've found build.R in the R tarball, which is where I suspect such a
>>>> check should go, and I'm willing to prepare a patch when I figure out
>>>> how this should be done. However, it seems worth asking if anyone has
>>>> needed to do this before. I've already done some searching, but the
>>>> results seem to pick up quite different posts than I need.
>>>>
>>>> Cheers, JN
>>>>
--
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net
More information about the R-package-devel
mailing list