[Rd] Comments requested on "changedFiles" function
Scott Kostyshak
skostysh at princeton.edu
Thu Sep 5 23:13:57 CEST 2013
On Thu, Sep 5, 2013 at 6:48 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
> On 13-09-04 11:36 PM, Scott Kostyshak wrote:
>>
>> On Wed, Sep 4, 2013 at 1:53 PM, Duncan Murdoch <murdoch.duncan at gmail.com>
>> wrote:
>>>
>>> In a number of places internal to R, we need to know which files have
>>> changed (e.g. after building a vignette). I've just written a general
>>> purpose function "changedFiles" that I'll probably commit to R-devel.
>>> Comments on the design (or bug reports) would be appreciated.
>>>
>>> The source for the function and the Rd page for it are inline below.
>>
>>
>> This looks like a useful function. Thanks for writing it. I have only
>> one (picky) comment below.
>>
>>> ----- changedFiles.R:
>>> changedFiles <- function(snapshot, timestamp = tempfile("timestamp"),
>>> file.info = NULL,
>>> md5sum = FALSE, full.names = FALSE, ...) {
>>> dosnapshot <- function(args) {
>>> fullnames <- do.call(list.files, c(full.names = TRUE, args))
>>> names <- do.call(list.files, c(full.names = full.names, args))
>>> if (isTRUE(file.info) || (is.character(file.info) &&
>>> length(file.info))) {
>>> info <- file.info(fullnames)
>>> rownames(info) <- names
>>> if (isTRUE(file.info))
>>> file.info <- c("size", "isdir", "mode", "mtime")
>>> } else
>>> info <- data.frame(row.names=names)
>>> if (md5sum)
>>> info <- data.frame(info, md5sum = tools::md5sum(fullnames))
>>> list(info = info, timestamp = timestamp, file.info = file.info,
>>> md5sum = md5sum, full.names = full.names, args = args)
>>> }
>>> if (missing(snapshot) || !inherits(snapshot,
>>> "changedFilesSnapshot")) {
>>> if (length(timestamp) == 1)
>>> file.create(timestamp)
>>> if (missing(snapshot)) snapshot <- "."
>>> pre <- dosnapshot(list(path = snapshot, ...))
>>> pre$pre <- pre$info
>>> pre$info <- NULL
>>> pre$wd <- getwd()
>>> class(pre) <- "changedFilesSnapshot"
>>> return(pre)
>>> }
>>>
>>> if (missing(timestamp)) timestamp <- snapshot$timestamp
>>> if (missing(file.info) || isTRUE(file.info)) file.info <-
>>> snapshot$file.info
>>> if (identical(file.info, FALSE)) file.info <- NULL
>>> if (missing(md5sum)) md5sum <- snapshot$md5sum
>>> if (missing(full.names)) full.names <- snapshot$full.names
>>>
>>> pre <- snapshot$pre
>>> savewd <- getwd()
>>> on.exit(setwd(savewd))
>>> setwd(snapshot$wd)
>>>
>>> args <- snapshot$args
>>> newargs <- list(...)
>>> args[names(newargs)] <- newargs
>>> post <- dosnapshot(args)$info
>>> prenames <- rownames(pre)
>>> postnames <- rownames(post)
>>>
>>> added <- setdiff(postnames, prenames)
>>> deleted <- setdiff(prenames, postnames)
>>> common <- intersect(prenames, postnames)
>>>
>>> if (length(file.info)) {
>>> preinfo <- pre[common, file.info]
>>> postinfo <- post[common, file.info]
>>> changes <- preinfo != postinfo
>>> }
>>> else changes <- matrix(logical(0), nrow = length(common), ncol = 0,
>>> dimnames = list(common, character(0)))
>>> if (length(timestamp))
>>> changes <- cbind(changes, Newer = file_test("-nt", common,
>>> timestamp))
>>> if (md5sum) {
>>> premd5 <- pre[common, "md5sum"]
>>> postmd5 <- post[common, "md5sum"]
>>> changes <- cbind(changes, md5sum = premd5 != postmd5)
>>> }
>>> changes1 <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop =
>>> FALSE]
>>> changed <- rownames(changes1)
>>> structure(list(added = added, deleted = deleted, changed = changed,
>>> unchanged = setdiff(common, changed), changes = changes), class
>>> =
>>> "changedFiles")
>>> }
>>>
>>> print.changedFilesSnapshot <- function(x, ...) {
>>> cat("changedFiles snapshot:\n timestamp = \"", x$timestamp, "\"\n
>>> file.info = ",
>>> if (length(x$file.info)) paste(paste0('"', x$file.info, '"'),
>>> collapse=","),
>>> "\n md5sum = ", x$md5sum, "\n args = ", deparse(x$args, control
>>> =
>>> NULL), "\n", sep="")
>>> x
>>> }
>>>
>>> print.changedFiles <- function(x, ...) {
>>> if (length(x$added)) cat("Files added:\n", paste0(" ", x$added,
>>> collapse="\n"), "\n", sep="")
>>> if (length(x$deleted)) cat("Files deleted:\n", paste0(" ",
>>> x$deleted,
>>> collapse="\n"), "\n", sep="")
>>> changes <- x$changes
>>> changes <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop=FALSE]
>>> changes <- changes[, colSums(changes, na.rm = TRUE) > 0, drop=FALSE]
>>> if (nrow(changes)) {
>>> cat("Files changed:\n")
>>> print(changes)
>>> }
>>> x
>>> }
>>> ----------------------
>>>
>>> --- changedFiles.Rd:
>>> \name{changedFiles}
>>> \alias{changedFiles}
>>> \alias{print.changedFiles}
>>> \alias{print.changedFilesSnapshot}
>>> \title{
>>> Detect which files have changed
>>> }
>>> \description{
>>> On the first call, \code{changedFiles} takes a snapshot of a selection of
>>> files. In subsequent
>>> calls, it takes another snapshot, and returns an object containing data
>>> on
>>> the
>>> differences between the two snapshots. The snapshots need not be the
>>> same
>>> directory;
>>> this could be used to compare two directories.
>>> }
>>> \usage{
>>> changedFiles(snapshot, timestamp = tempfile("timestamp"), file.info =
>>> NULL,
>>> md5sum = FALSE, full.names = FALSE, ...)
>>> }
>>> \arguments{
>>> \item{snapshot}{
>>> The path to record, or a previous snapshot. See the Details.
>>> }
>>> \item{timestamp}{
>>> The name of a file to write at the time the initial snapshot
>>> is taken. In subsequent calls, modification times of files will be
>>> compared
>>> to
>>> this file, and newer files will be reported as changed. Set to
>>> \code{NULL}
>>> to skip this test.
>>> }
>>> \item{file.info}{
>>> A vector of columns from the result of the \code{file.info} function, or
>>> a
>>> logical value. If
>>> \code{TRUE}, columns \code{c("size", "isdir", "mode", "mtime")} will be
>>> used. Set to
>>> \code{FALSE} or \code{NULL} to skip this test. See the Details.
>>> }
>>> \item{md5sum}{
>>> A logical value indicating whether MD5 summaries should be taken as part
>>> of
>>> the snapshot.
>>> }
>>> \item{full.names}{
>>> A logical value indicating whether full names (as in
>>> \code{\link{list.files}}) should be
>>> recorded.
>>> }
>>> \item{\dots}{
>>> Additional parameters to pass to \code{\link{list.files}} to control the
>>> set
>>> of files
>>> in the snapshots.
>>> }
>>> }
>>> \details{
>>> This function works in two modes. If the \code{snapshot} argument is
>>> missing or is
>>> not of S3 class \code{"changedFilesSnapshot"}, it is used as the
>>> \code{path}
>>> argument
>>> to \code{\link{list.files}} to obtain a list of files. If it is of class
>>> \code{"changedFilesSnapshot"}, then it is taken to be the baseline file
>>> and a new snapshot is taken and compared with it. In the latter case,
>>> missing
>>> arguments default to match those from the initial snapshot.
>>>
>>> If the \code{timestamp} argument is length 1, a file with that name is
>>> created
>>> in the current directory during the initial snapshot, and
>>> \code{\link{file_test}}
>>> is used to compare the age of all files to it during subsequent calls.
>>>
>>> If the \code{file.info} argument is \code{TRUE} or it contains a
>>> non-empty
>>> character vector, the indicated columns from the result of a call to
>>> \code{\link{file.info}} will be recorded and compared.
>>>
>>> If \code{md5sum} is \code{TRUE}, the \code{tools::\link{md5sum}} function
>>> will be called to record the 32 byte MD5 checksum for each file, and
>>> these
>>> values
>>> will be compared.
>>> }
>>> \value{
>>> In the initial snapshot phase, an object of class
>>> \code{"changedFilesSnapshot"} is returned. This
>>> is a list containing the fields
>>> \item{pre}{a dataframe whose rownames are the filenames, and whose
>>> columns
>>> contain the
>>> requested snapshot data}
>>> \item{timestamp, file.info, md5sum, full.names}{a record of the arguments
>>> in
>>> the initial call}
>>> \item{args}{other arguments passed via \code{...} to
>>> \code{\link{list.files}}.}
>>>
>>> In the comparison phase, an object of class \code{"changedFiles"}. This
>>> is a
>>> list containing
>>> \item{added, deleted, changed, unchanged}{character vectors of filenames
>>> from the before
>>> and after snapshots, with obvious meanings}
>>> \item{changes}{a logical matrix with a row for each common file, and a
>>> column for each
>>> comparison test. \code{TRUE} indicates a change in that test.}
>>>
>>> \code{\link{print}} methods are defined for each of these types. The
>>> \code{\link{print}} method for \code{"changedFilesSnapshot"} objects
>>> displays the arguments used to produce it, while the one for
>>> \code{"changedFiles"} displays the \code{added}, \code{deleted}
>>> and \code{changed} fields if non-empty, and a submatrix of the
>>> \code{changes}
>>> matrix containing all of the \code{TRUE} values.
>>> }
>>> \author{
>>> Duncan Murdoch
>>> }
>>> \seealso{
>>> \code{\link{file.info}}, \code{\link{file_test}}, \code{\link{md5sum}}.
>>> }
>>> \examples{
>>> # Create some files in a temporary directory
>>> dir <- tempfile()
>>> dir.create(dir)
>>
>>
>> Should a different name than 'dir' be used since 'dir' is a base function?
>
>
> Such as?
'dir_', 'dir1', 'temp_dir', none of which is a base function. I
thought that it was not recommended to create objects with the same
name as functions, but perhaps this recommended practice is not agreed
on.
>> Further, if someone is not very familiar with R (or just not in "R
>> mode" at the time of reading), they might think that 'dir.create' is
>> calling the create member of the object named 'dir' that you just
>> made.
>
>
> dir.create is an existing function. I wouldn't have named it that, but
> that's its name.
I meant that if the object is called, e.g. 'temp_dir', one will not
think that 'dir.create' is a call to the 'create' member of 'dir'
because there is no 'dir' object apart from the base function. But
anyone with experience in R would know that this is not how R parses
'dir.create'.
In any case, I shouldn't waste your time on such a minor and subjective thing.
Scott
> Duncan Murdoch
>
>
>>
>> Scott
>>
>>> writeBin(1, file.path(dir, "file1"))
>>> writeBin(2, file.path(dir, "file2"))
>>> dir.create(file.path(dir, "dir"))
>>>
>>> # Take a snapshot
>>> snapshot <- changedFiles(dir, file.info=TRUE, md5sum=TRUE)
>>>
>>> # Change one of the files
>>> writeBin(3, file.path(dir, "file2"))
>>>
>>> # Display the detected changes
>>> changedFiles(snapshot)
>>> changedFiles(snapshot)$changes
>>> }
>>> \keyword{utilities}
>>> \keyword{file}
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>>
>> --
>> Scott Kostyshak
>> Economics PhD Candidate
>> Princeton University
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
--
Scott Kostyshak
Economics PhD Candidate
Princeton University
More information about the R-devel
mailing list