[Rd] Comments requested on "changedFiles" function

Duncan Murdoch murdoch.duncan at gmail.com
Thu Sep 5 12:48:47 CEST 2013


On 13-09-04 11:36 PM, Scott Kostyshak wrote:
> On Wed, Sep 4, 2013 at 1:53 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
>> In a number of places internal to R, we need to know which files have
>> changed (e.g. after building a vignette).  I've just written a general
>> purpose function "changedFiles" that I'll probably commit to R-devel.
>> Comments on the design (or bug reports) would be appreciated.
>>
>> The source for the function and the Rd page for it are inline below.
>
> This looks like a useful function. Thanks for writing it. I have only
> one (picky) comment below.
>
>> ----- changedFiles.R:
>> changedFiles <- function(snapshot, timestamp = tempfile("timestamp"),
>> file.info = NULL,
>>               md5sum = FALSE, full.names = FALSE, ...) {
>>      dosnapshot <- function(args) {
>>          fullnames <- do.call(list.files, c(full.names = TRUE, args))
>>          names <- do.call(list.files, c(full.names = full.names, args))
>>          if (isTRUE(file.info) || (is.character(file.info) &&
>> length(file.info))) {
>>              info <- file.info(fullnames)
>>          rownames(info) <- names
>>              if (isTRUE(file.info))
>>                  file.info <- c("size", "isdir", "mode", "mtime")
>>          } else
>>              info <- data.frame(row.names=names)
>>      if (md5sum)
>>          info <- data.frame(info, md5sum = tools::md5sum(fullnames))
>>      list(info = info, timestamp = timestamp, file.info = file.info,
>>           md5sum = md5sum, full.names = full.names, args = args)
>>      }
>>      if (missing(snapshot) || !inherits(snapshot, "changedFilesSnapshot")) {
>>          if (length(timestamp) == 1)
>>              file.create(timestamp)
>>          if (missing(snapshot)) snapshot <- "."
>>          pre <- dosnapshot(list(path = snapshot, ...))
>>          pre$pre <- pre$info
>>          pre$info <- NULL
>>          pre$wd <- getwd()
>>          class(pre) <- "changedFilesSnapshot"
>>          return(pre)
>>      }
>>
>>      if (missing(timestamp)) timestamp <- snapshot$timestamp
>>      if (missing(file.info) || isTRUE(file.info)) file.info <-
>> snapshot$file.info
>>      if (identical(file.info, FALSE)) file.info <- NULL
>>      if (missing(md5sum))    md5sum <- snapshot$md5sum
>>      if (missing(full.names)) full.names <- snapshot$full.names
>>
>>      pre <- snapshot$pre
>>      savewd <- getwd()
>>      on.exit(setwd(savewd))
>>      setwd(snapshot$wd)
>>
>>      args <- snapshot$args
>>      newargs <- list(...)
>>      args[names(newargs)] <- newargs
>>      post <- dosnapshot(args)$info
>>      prenames <- rownames(pre)
>>      postnames <- rownames(post)
>>
>>      added <- setdiff(postnames, prenames)
>>      deleted <- setdiff(prenames, postnames)
>>      common <- intersect(prenames, postnames)
>>
>>      if (length(file.info)) {
>>          preinfo <- pre[common, file.info]
>>          postinfo <- post[common, file.info]
>>          changes <- preinfo != postinfo
>>      }
>>      else changes <- matrix(logical(0), nrow = length(common), ncol = 0,
>>                             dimnames = list(common, character(0)))
>>      if (length(timestamp))
>>          changes <- cbind(changes, Newer = file_test("-nt", common,
>> timestamp))
>>      if (md5sum) {
>>          premd5 <- pre[common, "md5sum"]
>>          postmd5 <- post[common, "md5sum"]
>>      changes <- cbind(changes, md5sum = premd5 != postmd5)
>>      }
>>      changes1 <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop = FALSE]
>>      changed <- rownames(changes1)
>>      structure(list(added = added, deleted = deleted, changed = changed,
>>          unchanged = setdiff(common, changed), changes = changes), class =
>> "changedFiles")
>> }
>>
>> print.changedFilesSnapshot <- function(x, ...) {
>>      cat("changedFiles snapshot:\n timestamp = \"", x$timestamp, "\"\n
>> file.info = ",
>>          if (length(x$file.info)) paste(paste0('"', x$file.info, '"'),
>> collapse=","),
>>          "\n md5sum = ", x$md5sum, "\n args = ", deparse(x$args, control =
>> NULL), "\n", sep="")
>>      x
>> }
>>
>> print.changedFiles <- function(x, ...) {
>>      if (length(x$added)) cat("Files added:\n",  paste0("  ", x$added,
>> collapse="\n"), "\n", sep="")
>>      if (length(x$deleted)) cat("Files deleted:\n",  paste0("  ", x$deleted,
>> collapse="\n"), "\n", sep="")
>>      changes <- x$changes
>>      changes <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop=FALSE]
>>      changes <- changes[, colSums(changes, na.rm = TRUE) > 0, drop=FALSE]
>>      if (nrow(changes)) {
>>          cat("Files changed:\n")
>>          print(changes)
>>      }
>>      x
>> }
>> ----------------------
>>
>> --- changedFiles.Rd:
>> \name{changedFiles}
>> \alias{changedFiles}
>> \alias{print.changedFiles}
>> \alias{print.changedFilesSnapshot}
>> \title{
>> Detect which files have changed
>> }
>> \description{
>> On the first call, \code{changedFiles} takes a snapshot of a selection of
>> files.  In subsequent
>> calls, it takes another snapshot, and returns an object containing data on
>> the
>> differences between the two snapshots.  The snapshots need not be the same
>> directory;
>> this could be used to compare two directories.
>> }
>> \usage{
>> changedFiles(snapshot, timestamp = tempfile("timestamp"), file.info = NULL,
>>               md5sum = FALSE, full.names = FALSE, ...)
>> }
>> \arguments{
>>    \item{snapshot}{
>> The path to record, or a previous snapshot.  See the Details.
>> }
>>    \item{timestamp}{
>> The name of a file to write at the time the initial snapshot
>> is taken.  In subsequent calls, modification times of files will be compared
>> to
>> this file, and newer files will be reported as changed.  Set to \code{NULL}
>> to skip this test.
>> }
>>    \item{file.info}{
>> A vector of columns from the result of the \code{file.info} function, or a
>> logical value.  If
>> \code{TRUE}, columns \code{c("size", "isdir", "mode", "mtime")} will be
>> used.  Set to
>> \code{FALSE} or \code{NULL} to skip this test.  See the Details.
>> }
>>    \item{md5sum}{
>> A logical value indicating whether MD5 summaries should be taken as part of
>> the snapshot.
>> }
>>    \item{full.names}{
>> A logical value indicating whether full names (as in
>> \code{\link{list.files}}) should be
>> recorded.
>> }
>>    \item{\dots}{
>> Additional parameters to pass to \code{\link{list.files}} to control the set
>> of files
>> in the snapshots.
>> }
>> }
>> \details{
>> This function works in two modes.  If the \code{snapshot} argument is
>> missing or is
>> not of S3 class \code{"changedFilesSnapshot"}, it is used as the \code{path}
>> argument
>> to \code{\link{list.files}} to obtain a list of files.  If it is of class
>> \code{"changedFilesSnapshot"}, then it is taken to be the baseline file
>> and a new snapshot is taken and compared with it.  In the latter case,
>> missing
>> arguments default to match those from the initial snapshot.
>>
>> If the \code{timestamp} argument is length 1, a file with that name is
>> created
>> in the current directory during the initial snapshot, and
>> \code{\link{file_test}}
>> is used to compare the age of all files to it during subsequent calls.
>>
>> If the \code{file.info} argument is \code{TRUE} or it contains a non-empty
>> character vector, the indicated columns from the result of a call to
>> \code{\link{file.info}} will be recorded and compared.
>>
>> If \code{md5sum} is \code{TRUE}, the \code{tools::\link{md5sum}} function
>> will be called to record the 32 byte MD5 checksum for each file, and these
>> values
>> will be compared.
>> }
>> \value{
>> In the initial snapshot phase, an object of class
>> \code{"changedFilesSnapshot"} is returned.  This
>> is a list containing the fields
>> \item{pre}{a dataframe whose rownames are the filenames, and whose columns
>> contain the
>> requested snapshot data}
>> \item{timestamp, file.info, md5sum, full.names}{a record of the arguments in
>> the initial call}
>> \item{args}{other arguments passed via \code{...} to
>> \code{\link{list.files}}.}
>>
>> In the comparison phase, an object of class \code{"changedFiles"}. This is a
>> list containing
>> \item{added, deleted, changed, unchanged}{character vectors of filenames
>> from the before
>> and after snapshots, with obvious meanings}
>> \item{changes}{a logical matrix with a row for each common file, and a
>> column for each
>> comparison test.  \code{TRUE} indicates a change in that test.}
>>
>> \code{\link{print}} methods are defined for each of these types. The
>> \code{\link{print}} method for \code{"changedFilesSnapshot"} objects
>> displays the arguments used to produce it, while the one for
>> \code{"changedFiles"} displays the \code{added}, \code{deleted}
>> and \code{changed} fields if non-empty, and a submatrix of the
>> \code{changes}
>> matrix containing all of the \code{TRUE} values.
>> }
>> \author{
>> Duncan Murdoch
>> }
>> \seealso{
>> \code{\link{file.info}}, \code{\link{file_test}}, \code{\link{md5sum}}.
>> }
>> \examples{
>> # Create some files in a temporary directory
>> dir <- tempfile()
>> dir.create(dir)
>
> Should a different name than 'dir' be used since 'dir' is a base function?

Such as?

> Further, if someone is not very familiar with R (or just not in "R
> mode" at the time of reading), they might think that 'dir.create' is
> calling the create member of the object named 'dir' that you just
> made.

dir.create is an existing function.  I wouldn't have named it that, but 
that's its name.

Duncan Murdoch

>
> Scott
>
>> writeBin(1, file.path(dir, "file1"))
>> writeBin(2, file.path(dir, "file2"))
>> dir.create(file.path(dir, "dir"))
>>
>> # Take a snapshot
>> snapshot <- changedFiles(dir, file.info=TRUE, md5sum=TRUE)
>>
>> # Change one of the files
>> writeBin(3, file.path(dir, "file2"))
>>
>> # Display the detected changes
>> changedFiles(snapshot)
>> changedFiles(snapshot)$changes
>> }
>> \keyword{utilities}
>> \keyword{file}
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> --
> Scott Kostyshak
> Economics PhD Candidate
> Princeton University
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list