[Rd] Comments requested on "changedFiles" function

Karl Millar kmillar at google.com
Sat Sep 7 03:21:32 CEST 2013


Hi Duncan,

I like the interface of this version a lot better, but there's still a
bunch of implementation details that need fixing:

* As previously mentioned, there are important cases where the mtime
values change in ways that this code doesn't detect.
* If the timestamp file (which is usually in the temp directory) gets
deleted (which can happen after a moderate amount of time of
inactivity on some systems), then the file_test('-nt', ...) will
always return false, even if the file has changed.
* If files get added or deleted between the two calls to list.files in
fileSnapshot, it will fail with an error.
* If the path is on a remote file system, tempdir is local, and
there's significant clock skew, then you can get incorrect results.

Unfortunately, these aren't just theoretical scenarios -- I've had the
misfortune to run up against all of them in the past.

I've attached code that's loosely based on your implementation that
solves these problems AFAICT.  Alternatively, Hadley's code handles
all of these correctly, with the exception that compare_state doesn't
handle the case where safe_digest returns NA very well.

Regards,

Karl

On Fri, Sep 6, 2013 at 5:40 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
> On 13-09-06 7:40 PM, Scott Kostyshak wrote:
>>
>> On Fri, Sep 6, 2013 at 3:46 PM, Duncan Murdoch <murdoch.duncan at gmail.com>
>> wrote:
>>>
>>> On 06/09/2013 2:20 PM, Duncan Murdoch wrote:
>>>>
>>>>
>>>> I have now put the code into a temporary package for testing; if anyone
>>>> is interested, for a few days it will be downloadable from
>>>>
>>>> fisher.stats.uwo.ca/faculty/murdoch/temp/testpkg_1.0.tar.gz
>>>
>>>
>>>
>>> Sorry, error in the URL.  It should be
>>>
>>> http://www.stats.uwo.ca/faculty/murdoch/temp/testpkg_1.0.tar.gz
>>
>>
>> Works well. A couple of things I noticed:
>>
>> (1)
>> md5sum is being called on directories, which causes warnings. (If this
>> is not viewed as undesirable, please ignore the rest of this comment.)
>> Should this be the responsibility of the user (by passing arguments to
>> list.files)? In the example, changing
>> fileSnapshot(dir, file.info=TRUE, md5sum=TRUE)
>> to
>> fileSnapshot(dir, file.info=TRUE, md5sum=TRUE, include.dirs=FALSE,
>> recursive=TRUE")
>>
>> gets rid of the warnings. But perhaps the user just wants to exclude
>> directories for the md5sum calculations. This can't be controlled from
>> fileSnapshot.
>
>
> I don't see the warnings, I just get NA values.  I'll try to see why there's
> a difference.  (One possibility is my platform (Windows); another is that
> I'm generally testing in R-patched and R-devel rather than the 3.0.1 release
> version.)  I would rather suppress the warnings than make the user avoid
> them.
>
>
>>
>> Or, should the "if (md5sum)" chunk subset "fullnames" using file_test
>> or file.info to exclude directories (and then fill in the directories
>> with NA)?
>>
>> (2)
>> If I run example(changedFiles) several times, sometimes I get:
>>
>> chngdF> changedFiles(snapshot)
>> File changes:
>>        mtime md5sum
>> file2  TRUE   TRUE
>>
>> and other times I get:
>>
>> chngdF> changedFiles(snapshot)
>> File changes:
>>        md5sum
>> file2   TRUE
>>
>> I wonder why.
>
>
> Sometimes the example runs so quickly that the new version has exactly the
> same modification time as the original.  That's the risk of the mtime check.
> If you put a delay between, you'll get consistent results.
>
> Duncan Murdoch
>
>
>>
>> Scott
>>
>>> sessionInfo()
>>
>> R Under development (unstable) (2013-08-31 r63780)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] testpkg_1.0
>>
>> loaded via a namespace (and not attached):
>> [1] tools_3.1.0
>>>
>>>
>>
>>
>> --
>> Scott Kostyshak
>> Economics PhD Candidate
>> Princeton University
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


More information about the R-devel mailing list