[Rd] Comments requested on "changedFiles" function

Duncan Murdoch murdoch.duncan at gmail.com
Wed Sep 4 19:53:48 CEST 2013

In a number of places internal to R, we need to know which files have 
changed (e.g. after building a vignette).  I've just written a general 
purpose function "changedFiles" that I'll probably commit to R-devel.  
Comments on the design (or bug reports) would be appreciated.

The source for the function and the Rd page for it are inline below.

----- changedFiles.R:
changedFiles <- function(snapshot, timestamp = tempfile("timestamp"), 
file.info = NULL,
              md5sum = FALSE, full.names = FALSE, ...) {
     dosnapshot <- function(args) {
         fullnames <- do.call(list.files, c(full.names = TRUE, args))
         names <- do.call(list.files, c(full.names = full.names, args))
         if (isTRUE(file.info) || (is.character(file.info) && 
length(file.info))) {
             info <- file.info(fullnames)
         rownames(info) <- names
             if (isTRUE(file.info))
                 file.info <- c("size", "isdir", "mode", "mtime")
         } else
             info <- data.frame(row.names=names)
     if (md5sum)
         info <- data.frame(info, md5sum = tools::md5sum(fullnames))
     list(info = info, timestamp = timestamp, file.info = file.info,
          md5sum = md5sum, full.names = full.names, args = args)
     if (missing(snapshot) || !inherits(snapshot, "changedFilesSnapshot")) {
         if (length(timestamp) == 1)
         if (missing(snapshot)) snapshot <- "."
         pre <- dosnapshot(list(path = snapshot, ...))
         pre$pre <- pre$info
         pre$info <- NULL
         pre$wd <- getwd()
         class(pre) <- "changedFilesSnapshot"

     if (missing(timestamp)) timestamp <- snapshot$timestamp
     if (missing(file.info) || isTRUE(file.info)) file.info <- 
     if (identical(file.info, FALSE)) file.info <- NULL
     if (missing(md5sum))    md5sum <- snapshot$md5sum
     if (missing(full.names)) full.names <- snapshot$full.names

     pre <- snapshot$pre
     savewd <- getwd()

     args <- snapshot$args
     newargs <- list(...)
     args[names(newargs)] <- newargs
     post <- dosnapshot(args)$info
     prenames <- rownames(pre)
     postnames <- rownames(post)

     added <- setdiff(postnames, prenames)
     deleted <- setdiff(prenames, postnames)
     common <- intersect(prenames, postnames)

     if (length(file.info)) {
         preinfo <- pre[common, file.info]
         postinfo <- post[common, file.info]
         changes <- preinfo != postinfo
     else changes <- matrix(logical(0), nrow = length(common), ncol = 0,
                            dimnames = list(common, character(0)))
     if (length(timestamp))
         changes <- cbind(changes, Newer = file_test("-nt", common, 
     if (md5sum) {
         premd5 <- pre[common, "md5sum"]
         postmd5 <- post[common, "md5sum"]
     changes <- cbind(changes, md5sum = premd5 != postmd5)
     changes1 <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop = FALSE]
     changed <- rownames(changes1)
     structure(list(added = added, deleted = deleted, changed = changed,
         unchanged = setdiff(common, changed), changes = changes), class 
= "changedFiles")

print.changedFilesSnapshot <- function(x, ...) {
     cat("changedFiles snapshot:\n timestamp = \"", x$timestamp, "\"\n 
file.info = ",
         if (length(x$file.info)) paste(paste0('"', x$file.info, '"'), 
         "\n md5sum = ", x$md5sum, "\n args = ", deparse(x$args, control 
= NULL), "\n", sep="")

print.changedFiles <- function(x, ...) {
     if (length(x$added)) cat("Files added:\n",  paste0("  ", x$added, 
collapse="\n"), "\n", sep="")
     if (length(x$deleted)) cat("Files deleted:\n",  paste0("  ", 
x$deleted, collapse="\n"), "\n", sep="")
     changes <- x$changes
     changes <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop=FALSE]
     changes <- changes[, colSums(changes, na.rm = TRUE) > 0, drop=FALSE]
     if (nrow(changes)) {
         cat("Files changed:\n")

--- changedFiles.Rd:
Detect which files have changed
On the first call, \code{changedFiles} takes a snapshot of a selection 
of files.  In subsequent
calls, it takes another snapshot, and returns an object containing data 
on the
differences between the two snapshots.  The snapshots need not be the 
same directory;
this could be used to compare two directories.
changedFiles(snapshot, timestamp = tempfile("timestamp"), file.info = NULL,
              md5sum = FALSE, full.names = FALSE, ...)
The path to record, or a previous snapshot.  See the Details.
The name of a file to write at the time the initial snapshot
is taken.  In subsequent calls, modification times of files will be 
compared to
this file, and newer files will be reported as changed.  Set to \code{NULL}
to skip this test.
A vector of columns from the result of the \code{file.info} function, or 
a logical value.  If
\code{TRUE}, columns \code{c("size", "isdir", "mode", "mtime")} will be 
used.  Set to
\code{FALSE} or \code{NULL} to skip this test.  See the Details.
A logical value indicating whether MD5 summaries should be taken as part 
of the snapshot.
A logical value indicating whether full names (as in 
\code{\link{list.files}}) should be
Additional parameters to pass to \code{\link{list.files}} to control the 
set of files
in the snapshots.
This function works in two modes.  If the \code{snapshot} argument is 
missing or is
not of S3 class \code{"changedFilesSnapshot"}, it is used as the 
\code{path} argument
to \code{\link{list.files}} to obtain a list of files.  If it is of class
\code{"changedFilesSnapshot"}, then it is taken to be the baseline file
and a new snapshot is taken and compared with it.  In the latter case, 
arguments default to match those from the initial snapshot.

If the \code{timestamp} argument is length 1, a file with that name is 
in the current directory during the initial snapshot, and 
is used to compare the age of all files to it during subsequent calls.

If the \code{file.info} argument is \code{TRUE} or it contains a non-empty
character vector, the indicated columns from the result of a call to
\code{\link{file.info}} will be recorded and compared.

If \code{md5sum} is \code{TRUE}, the \code{tools::\link{md5sum}} function
will be called to record the 32 byte MD5 checksum for each file, and 
these values
will be compared.
In the initial snapshot phase, an object of class 
\code{"changedFilesSnapshot"} is returned.  This
is a list containing the fields
\item{pre}{a dataframe whose rownames are the filenames, and whose 
columns contain the
requested snapshot data}
\item{timestamp, file.info, md5sum, full.names}{a record of the 
arguments in the initial call}
\item{args}{other arguments passed via \code{...} to 

In the comparison phase, an object of class \code{"changedFiles"}. This 
is a list containing
\item{added, deleted, changed, unchanged}{character vectors of filenames 
from the before
and after snapshots, with obvious meanings}
\item{changes}{a logical matrix with a row for each common file, and a 
column for each
comparison test.  \code{TRUE} indicates a change in that test.}

\code{\link{print}} methods are defined for each of these types. The
\code{\link{print}} method for \code{"changedFilesSnapshot"} objects
displays the arguments used to produce it, while the one for
\code{"changedFiles"} displays the \code{added}, \code{deleted}
and \code{changed} fields if non-empty, and a submatrix of the 
matrix containing all of the \code{TRUE} values.
Duncan Murdoch
\code{\link{file.info}}, \code{\link{file_test}}, \code{\link{md5sum}}.
# Create some files in a temporary directory
dir <- tempfile()
writeBin(1, file.path(dir, "file1"))
writeBin(2, file.path(dir, "file2"))
dir.create(file.path(dir, "dir"))

# Take a snapshot
snapshot <- changedFiles(dir, file.info=TRUE, md5sum=TRUE)

# Change one of the files
writeBin(3, file.path(dir, "file2"))

# Display the detected changes

More information about the R-devel mailing list