[R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

"Jens Oehlschlägel" jens.oehlschlaegel at truecluster.com
Thu May 3 13:28:52 CEST 2012


   Jonathan,
   On some filesystems (e.g. NTFS, see below) it is possible to create 'sparse'
   memory-mapped files, i.e. reserving the space without the cost of actually
   writing initial values.
   Package 'ff' does this automatically and also allows to access the file in
   parallel.  Check  the  example  below and see how big file creation is
   immediate.
   Jens Oehlschlägel
   > library(ff)
   > library(snowfall)
   > ncpus <- 2
   > n <- 1e8
   > system.time(
   + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
   + )
          User      System verstrichen
          0.01        0.00        0.02
   > # check finalizer, with an explicit filename we should have a 'close'
   finalizer
   > finalizer(x)
   [1] "close"
   > # if not, set it to 'close' inorder to not let slaves delete x on slave
   shutdown
   > finalizer(x) <- "close"
   > sfInit(parallel=TRUE, cpus=ncpus, type="SOCK")
   R Version:  R version 2.15.0 (2012-03-30)
   snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2 CPUs.
   > sfLibrary(ff)
   Library ff loaded.
   Library ff loaded in cluster.
   Warnmeldung:
   In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts =
   TRUE,  :
     'keep.source' is deprecated and will be ignored
   > sfExport("x") # note: do not export the same ff multiple times
   > # explicitely opening avoids a gc problem
   > sfClusterEval(open(x, caching="mmeachflush")) # opening with 'mmeachflush'
   inststead of 'mmnoflush' is a bit slower but prevents OS write storms when
   the file is larger than RAM
   [[1]]
   [1] TRUE
   [[2]]
   [1] TRUE
   > system.time(
   + sfLapply( chunk(x, length=ncpus), function(i){
   +   x[i] <- runif(sum(i))
   +   invisible()
   + })
   + )
          User      System verstrichen
          0.00        0.00       30.78
   > system.time(
   + s <- sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i], c(0.05,
   0.95)) )
   + )
          User      System verstrichen
          0.00        0.00        4.38
   > # for completeness
   > sfClusterEval(close(x))
   [[1]]
   [1] TRUE
   [[2]]
   [1] TRUE
   > csummary(s)
                5%  95%
   Min.    0.04998 0.95
   1st Qu. 0.04999 0.95
   Median  0.05001 0.95
   Mean    0.05001 0.95
   3rd Qu. 0.05002 0.95
   Max.    0.05003 0.95
   > # stop slaves
   > sfStop()
   Stopping cluster
   >  # with the close finalizer we are responsible for deleting the file
   explicitely (unless we want to keep it)
   > delete(x)
   [1] TRUE
   > # remove r-side metadata
   > rm(x)
   > # truly free memory
   > gc()
   Gesendet: Donnerstag, 03. Mai 2012 um 00:23 Uhr
   Von: "Jonathan Greenberg" <jgrn at illinois.edu>
   An: r-help <r-help at r-project.org>, r-sig-hpc at r-project.org
   Betreff: [R-sig-hpc] Quickest way to make a large "empty" file on disk?
   R-helpers:
   What would be the absolute fastest way to make a large "empty" file (e.g.
   filled with all zeroes) on disk, given a byte size and a given number
   number of empty values. I know I can use writeBin, but the "object" in
   this case may be far too large to store in main memory. I'm asking because
   I'm going to use this file in conjunction with mmap to do parallel writes
   to this file. Say, I want to create a blank file of 10,000 floating point
   numbers.
   Thanks!
   --j
   --
   Jonathan A. Greenberg, PhD
   Assistant Professor
   Department of Geography and Geographic Information Science
   University of Illinois at Urbana-Champaign
   607 South Mathews Avenue, MC 150
   Urbana, IL 61801
   Phone: 415-763-5476
   AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
   [1]http://www.geog.illinois.edu/people/JonathanGreenberg.html
   [[alternative HTML version deleted]]
   _______________________________________________
   R-sig-hpc mailing list
   R-sig-hpc at r-project.org
   [2]https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

References

   1. http://www.geog.illinois.edu/people/JonathanGreenberg.html
   2. https://stat.ethz.ch/mailman/listinfo/r-sig-hpc


More information about the R-help mailing list