[R] tempfile problem

Ben Madin lists at remoteinformation.com.au
Mon Jun 21 00:34:10 CEST 2010


Thanks all for the advice,

I'm using the time with parts of a second (to 6 decimal places) hashed, and so I ran a loop : (my attempt at repeatable code!)

> library(digest)
> op <- options(digits.secs=6)
> a <- NA
> for (i in 1:50000) { a[i] <- digest(Sys.time(), algo='crc32') }
> options(op)

and received no hash more than once, so I'll go with that for now.

I did some assume that there would be a suffix option, but it would appear that a loop to check could be the responsibility of the user. However, I think that it is odd that I can open an R session on three independent machines, and receive the same filename suggestions for the first invocation of the tempfile function each time. Because I am using PL/R to access R from PostgreSQL, this means I will always get the same names in order, which has left me very confused about how the random seeding process works? I was under the impression that the random seed was set by the system time when the session started, so there should be a very low probability of collisions, especially with a mix of machines and operating systems.

cheers

Ben


On 18/06/2010, at 1:17 , Romain Francois wrote:

> 
> Le 17/06/10 18:59, Duncan Murdoch a écrit :
>> 
>> On 17/06/2010 12:43 PM, Ben Madin wrote:
>>> G'day all,
>>> 
>>> The documentation for tempfile states :
>>> 
>>> "The names are very likely to be unique among calls to tempfile in an
>>> R session and across simultaneous R sessions. The filenames are
>>> guaranteed not to be currently in use."
>>> 
>>> My problem I think relates to the second part of the sentence, which
>>> is the guarantee... and it is being met ... but I need to save the
>>> files as .png files, in the same directory, so I am adding the suffix
>>> and I suppose therefore the next offering can be unique (as it doesn't
>>> have the prefix)
>>> 
>>> I am using a command like :
>>> 
>>> > fname <- basename(tempfile("nahis",
>>> "/Library/WebServer/Documents/nahis/tmp"))
>>> 
>>> on a mac, or
>>> > fname <- basename(tempfile("nahis", "/htdocs/nahis/tmp"))
>>> 
>>> on a FreeBSD system, as I need to be able to find the file from the
>>> web browser up to 24 hours later.
>>> 
>>> and then
>>> > this_filename <- paste(fname, ".png", sep = "")
>>> 
>>> and saving the file as this_filename, hence the next call doesn't find
>>> it's own suggestion, and starts again.
>> 
>> It sounds as though you are doing something strange with the random
>> number seed, because those names are chosen at random, and then checked
>> for uniqueness. If
>> the seed is being reset you could get the same name twice in a row, but
>> otherwise it's very unlikely. (And it's the C library function rand(),
>> not R's RNG that is used.)
>>> Is there any alternative filenameing approach I can use to get around
>>> this? Do I need to manually scan and reject the name if it matches the
>>> names I already have? Should I just digest the current time ? (It's
>>> working so far!)
>> 
>> If you use the current time, watch out for timer accuracy and fast
>> computers. You may be able to get more than one file created before the
>> next timer tick.
>> 
>> I'd suggest that you should generate more than enough filenames once at
>> the start, confirm they're all unique, and then just take them one by
>> one as needed. Alternatively, create the tempfile() as well as the
>> tempfile().png, but this is likely to be really slow if the seed is the
>> same each time, because checking for the existence of the first n tries
>> is going to be slow.
>> 
>> Duncan Murdoch
> 
> Would it not make sense to change the signature of tempfile to this:
> 
> function (pattern = "file", tmpdir = tempdir(), suffix = "" )
> 
> and include the suffix in the "does the file exist" test ?
> 
> Romain
> 
> -- 
> Romain Francois
> Professional R Enthusiast
> +33(0) 6 28 91 30 30
> http://romainfrancois.blog.free.fr
> |- http://bit.ly/98Uf7u : Rcpp 0.8.1
> |- http://bit.ly/c6YnCi : graph gallery collage
> `- http://bit.ly/bZ7ltC : inline 0.3.5
> 
> 



More information about the R-help mailing list