[Rd] Comments requested on "changedFiles" function
Scott Kostyshak
skostysh at princeton.edu
Thu Sep 5 05:36:45 CEST 2013
On Wed, Sep 4, 2013 at 1:53 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
> In a number of places internal to R, we need to know which files have
> changed (e.g. after building a vignette). I've just written a general
> purpose function "changedFiles" that I'll probably commit to R-devel.
> Comments on the design (or bug reports) would be appreciated.
>
> The source for the function and the Rd page for it are inline below.
This looks like a useful function. Thanks for writing it. I have only
one (picky) comment below.
> ----- changedFiles.R:
> changedFiles <- function(snapshot, timestamp = tempfile("timestamp"),
> file.info = NULL,
> md5sum = FALSE, full.names = FALSE, ...) {
> dosnapshot <- function(args) {
> fullnames <- do.call(list.files, c(full.names = TRUE, args))
> names <- do.call(list.files, c(full.names = full.names, args))
> if (isTRUE(file.info) || (is.character(file.info) &&
> length(file.info))) {
> info <- file.info(fullnames)
> rownames(info) <- names
> if (isTRUE(file.info))
> file.info <- c("size", "isdir", "mode", "mtime")
> } else
> info <- data.frame(row.names=names)
> if (md5sum)
> info <- data.frame(info, md5sum = tools::md5sum(fullnames))
> list(info = info, timestamp = timestamp, file.info = file.info,
> md5sum = md5sum, full.names = full.names, args = args)
> }
> if (missing(snapshot) || !inherits(snapshot, "changedFilesSnapshot")) {
> if (length(timestamp) == 1)
> file.create(timestamp)
> if (missing(snapshot)) snapshot <- "."
> pre <- dosnapshot(list(path = snapshot, ...))
> pre$pre <- pre$info
> pre$info <- NULL
> pre$wd <- getwd()
> class(pre) <- "changedFilesSnapshot"
> return(pre)
> }
>
> if (missing(timestamp)) timestamp <- snapshot$timestamp
> if (missing(file.info) || isTRUE(file.info)) file.info <-
> snapshot$file.info
> if (identical(file.info, FALSE)) file.info <- NULL
> if (missing(md5sum)) md5sum <- snapshot$md5sum
> if (missing(full.names)) full.names <- snapshot$full.names
>
> pre <- snapshot$pre
> savewd <- getwd()
> on.exit(setwd(savewd))
> setwd(snapshot$wd)
>
> args <- snapshot$args
> newargs <- list(...)
> args[names(newargs)] <- newargs
> post <- dosnapshot(args)$info
> prenames <- rownames(pre)
> postnames <- rownames(post)
>
> added <- setdiff(postnames, prenames)
> deleted <- setdiff(prenames, postnames)
> common <- intersect(prenames, postnames)
>
> if (length(file.info)) {
> preinfo <- pre[common, file.info]
> postinfo <- post[common, file.info]
> changes <- preinfo != postinfo
> }
> else changes <- matrix(logical(0), nrow = length(common), ncol = 0,
> dimnames = list(common, character(0)))
> if (length(timestamp))
> changes <- cbind(changes, Newer = file_test("-nt", common,
> timestamp))
> if (md5sum) {
> premd5 <- pre[common, "md5sum"]
> postmd5 <- post[common, "md5sum"]
> changes <- cbind(changes, md5sum = premd5 != postmd5)
> }
> changes1 <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop = FALSE]
> changed <- rownames(changes1)
> structure(list(added = added, deleted = deleted, changed = changed,
> unchanged = setdiff(common, changed), changes = changes), class =
> "changedFiles")
> }
>
> print.changedFilesSnapshot <- function(x, ...) {
> cat("changedFiles snapshot:\n timestamp = \"", x$timestamp, "\"\n
> file.info = ",
> if (length(x$file.info)) paste(paste0('"', x$file.info, '"'),
> collapse=","),
> "\n md5sum = ", x$md5sum, "\n args = ", deparse(x$args, control =
> NULL), "\n", sep="")
> x
> }
>
> print.changedFiles <- function(x, ...) {
> if (length(x$added)) cat("Files added:\n", paste0(" ", x$added,
> collapse="\n"), "\n", sep="")
> if (length(x$deleted)) cat("Files deleted:\n", paste0(" ", x$deleted,
> collapse="\n"), "\n", sep="")
> changes <- x$changes
> changes <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop=FALSE]
> changes <- changes[, colSums(changes, na.rm = TRUE) > 0, drop=FALSE]
> if (nrow(changes)) {
> cat("Files changed:\n")
> print(changes)
> }
> x
> }
> ----------------------
>
> --- changedFiles.Rd:
> \name{changedFiles}
> \alias{changedFiles}
> \alias{print.changedFiles}
> \alias{print.changedFilesSnapshot}
> \title{
> Detect which files have changed
> }
> \description{
> On the first call, \code{changedFiles} takes a snapshot of a selection of
> files. In subsequent
> calls, it takes another snapshot, and returns an object containing data on
> the
> differences between the two snapshots. The snapshots need not be the same
> directory;
> this could be used to compare two directories.
> }
> \usage{
> changedFiles(snapshot, timestamp = tempfile("timestamp"), file.info = NULL,
> md5sum = FALSE, full.names = FALSE, ...)
> }
> \arguments{
> \item{snapshot}{
> The path to record, or a previous snapshot. See the Details.
> }
> \item{timestamp}{
> The name of a file to write at the time the initial snapshot
> is taken. In subsequent calls, modification times of files will be compared
> to
> this file, and newer files will be reported as changed. Set to \code{NULL}
> to skip this test.
> }
> \item{file.info}{
> A vector of columns from the result of the \code{file.info} function, or a
> logical value. If
> \code{TRUE}, columns \code{c("size", "isdir", "mode", "mtime")} will be
> used. Set to
> \code{FALSE} or \code{NULL} to skip this test. See the Details.
> }
> \item{md5sum}{
> A logical value indicating whether MD5 summaries should be taken as part of
> the snapshot.
> }
> \item{full.names}{
> A logical value indicating whether full names (as in
> \code{\link{list.files}}) should be
> recorded.
> }
> \item{\dots}{
> Additional parameters to pass to \code{\link{list.files}} to control the set
> of files
> in the snapshots.
> }
> }
> \details{
> This function works in two modes. If the \code{snapshot} argument is
> missing or is
> not of S3 class \code{"changedFilesSnapshot"}, it is used as the \code{path}
> argument
> to \code{\link{list.files}} to obtain a list of files. If it is of class
> \code{"changedFilesSnapshot"}, then it is taken to be the baseline file
> and a new snapshot is taken and compared with it. In the latter case,
> missing
> arguments default to match those from the initial snapshot.
>
> If the \code{timestamp} argument is length 1, a file with that name is
> created
> in the current directory during the initial snapshot, and
> \code{\link{file_test}}
> is used to compare the age of all files to it during subsequent calls.
>
> If the \code{file.info} argument is \code{TRUE} or it contains a non-empty
> character vector, the indicated columns from the result of a call to
> \code{\link{file.info}} will be recorded and compared.
>
> If \code{md5sum} is \code{TRUE}, the \code{tools::\link{md5sum}} function
> will be called to record the 32 byte MD5 checksum for each file, and these
> values
> will be compared.
> }
> \value{
> In the initial snapshot phase, an object of class
> \code{"changedFilesSnapshot"} is returned. This
> is a list containing the fields
> \item{pre}{a dataframe whose rownames are the filenames, and whose columns
> contain the
> requested snapshot data}
> \item{timestamp, file.info, md5sum, full.names}{a record of the arguments in
> the initial call}
> \item{args}{other arguments passed via \code{...} to
> \code{\link{list.files}}.}
>
> In the comparison phase, an object of class \code{"changedFiles"}. This is a
> list containing
> \item{added, deleted, changed, unchanged}{character vectors of filenames
> from the before
> and after snapshots, with obvious meanings}
> \item{changes}{a logical matrix with a row for each common file, and a
> column for each
> comparison test. \code{TRUE} indicates a change in that test.}
>
> \code{\link{print}} methods are defined for each of these types. The
> \code{\link{print}} method for \code{"changedFilesSnapshot"} objects
> displays the arguments used to produce it, while the one for
> \code{"changedFiles"} displays the \code{added}, \code{deleted}
> and \code{changed} fields if non-empty, and a submatrix of the
> \code{changes}
> matrix containing all of the \code{TRUE} values.
> }
> \author{
> Duncan Murdoch
> }
> \seealso{
> \code{\link{file.info}}, \code{\link{file_test}}, \code{\link{md5sum}}.
> }
> \examples{
> # Create some files in a temporary directory
> dir <- tempfile()
> dir.create(dir)
Should a different name than 'dir' be used since 'dir' is a base function?
Further, if someone is not very familiar with R (or just not in "R
mode" at the time of reading), they might think that 'dir.create' is
calling the create member of the object named 'dir' that you just
made.
Scott
> writeBin(1, file.path(dir, "file1"))
> writeBin(2, file.path(dir, "file2"))
> dir.create(file.path(dir, "dir"))
>
> # Take a snapshot
> snapshot <- changedFiles(dir, file.info=TRUE, md5sum=TRUE)
>
> # Change one of the files
> writeBin(3, file.path(dir, "file2"))
>
> # Display the detected changes
> changedFiles(snapshot)
> changedFiles(snapshot)$changes
> }
> \keyword{utilities}
> \keyword{file}
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Scott Kostyshak
Economics PhD Candidate
Princeton University
More information about the R-devel
mailing list