[R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

jens.oehlschlaegel at truecluster.com jens.oehlschlaegel at truecluster.com
Fri Sep 28 15:36:21 CEST 2012


   Jonathan,
   ff has a utility function file.resize() which allows to give a new filesize
   in bytes using doubles.
   See ?file.resize
   Regards
   Jens Oehlschlägel
   Gesendet: Donnerstag, 27. September 2012 um 21:17 Uhr
   Von: "Jonathan Greenberg" <jgrn at illinois.edu>
   An: r-help <r-help at r-project.org>, r-sig-hpc at r-project.org
   Betreff: Re: [R-sig-hpc] Quickest way to make a large "empty" file on disk?
   Folks:
   Asked this question some time ago, and found what appeared (at first) to be
   the best solution, but I'm now finding a new problem. First off, it seemed
   like ff as Jens suggested worked:
   # outdata_ncells = the number of rows * number of columns * number of bands
   in an image:
   out<-ff(vmode="double",length=outdata_ncells,filename=filename)
   finalizer(out) <- close
   close(out)
   This was working fine until I attempted to set length to a VERY large
   number: outdata_ncells = 17711913600. This would create a file that is
   131.964GB. Big, but not obscenely so (and certainly not larger than the
   filesystem can handle). However, length appears to be restricted
   by .Machine$integer.max (I'm on a 64-bit windows box):
   > .Machine$integer.max
   [1] 2147483647
   Any suggestions on how to solve this problem for much larger file sizes?
   --j
   On    Thu,   May   3,   2012   at   10:44   AM,   Jonathan   Greenberg
   <jgrn at illinois.edu>wrote:
   > Thanks, all! I'll try these out. I'm trying to work up something that is
   > platform independent (if possible) for use with mmap. I'll do some tests
   > on these suggestions and see which works best. I'll try to report back in
   a
   > few days. Cheers!
   >
   > --j
   >
   >
   >
   > 2012/5/3 "Jens Oehlschlägel" <jens.oehlschlaegel at truecluster.com>
   >
   >> Jonathan,
   >>
   >> On some filesystems (e.g. NTFS, see below) it is possible to create
   >> 'sparse' memory-mapped files, i.e. reserving the space without the cost
   of
   >> actually writing initial values.
   >> Package 'ff' does this automatically and also allows to access the file
   >> in parallel. Check the example below and see how big file creation is
   >> immediate.
   >>
   >> Jens Oehlschlägel
   >>
   >>
   >> > library(ff)
   >> > library(snowfall)
   >> > ncpus <- 2
   >> > n <- 1e8
   >> > system.time(
   >> + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
   >> + )
   >> User System verstrichen
   >> 0.01 0.00 0.02
   >> > # check finalizer, with an explicit filename we should have a 'close'
   >> finalizer
   >> > finalizer(x)
   >> [1] "close"
   >> > # if not, set it to 'close' inorder to not let slaves delete x on slave
   >> shutdown
   >> > finalizer(x) <- "close"
   >> > sfInit(parallel=TRUE, cpus=ncpus, type="SOCK")
   >> R Version: R version 2.15.0 (2012-03-30)
   >>
   >> snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2
   >> CPUs.
   >>
   >> > sfLibrary(ff)
   >> Library ff loaded.
   >> Library ff loaded in cluster.
   >>
   >> Warnmeldung:
   >> In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts
   >> = TRUE, :
   >> 'keep.source' is deprecated and will be ignored
   >> > sfExport("x") # note: do not export the same ff multiple times
   >> > # explicitely opening avoids a gc problem
   >> > sfClusterEval(open(x, caching="mmeachflush")) # opening with
   >> 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS
   >> write storms when the file is larger than RAM
   >> [[1]]
   >> [1] TRUE
   >>
   >> [[2]]
   >> [1] TRUE
   >>
   >> > system.time(
   >> + sfLapply( chunk(x, length=ncpus), function(i){
   >> + x[i] <- runif(sum(i))
   >> + invisible()
   >> + })
   >> + )
   >> User System verstrichen
   >> 0.00 0.00 30.78
   >> > system.time(
   >> + s <- sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i],
   >> c(0.05, 0.95)) )
   >> + )
   >> User System verstrichen
   >> 0.00 0.00 4.38
   >> > # for completeness
   >> > sfClusterEval(close(x))
   >> [[1]]
   >> [1] TRUE
   >>
   >> [[2]]
   >> [1] TRUE
   >>
   >> > csummary(s)
   >> 5% 95%
   >> Min. 0.04998 0.95
   >> 1st Qu. 0.04999 0.95
   >> Median 0.05001 0.95
   >> Mean 0.05001 0.95
   >> 3rd Qu. 0.05002 0.95
   >> Max. 0.05003 0.95
   >> > # stop slaves
   >> > sfStop()
   >>
   >> Stopping cluster
   >>
   >> > # with the close finalizer we are responsible for deleting the file
   >> explicitely (unless we want to keep it)
   >> > delete(x)
   >> [1] TRUE
   >> > # remove r-side metadata
   >> > rm(x)
   >> > # truly free memory
   >> > gc()
   >>
   >>
   >>
   >> *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr
   >> *Von:* "Jonathan Greenberg" <jgrn at illinois.edu>
   >> *An:* r-help <r-help at r-project.org>, r-sig-hpc at r-project.org
   >> *Betreff:* [R-sig-hpc] Quickest way to make a large "empty" file on
   >> disk?
   >> R-helpers:
   >>
   >> What would be the absolute fastest way to make a large "empty" file (e.g.
   >> filled with all zeroes) on disk, given a byte size and a given number
   >> number of empty values. I know I can use writeBin, but the "object" in
   >>  this case may be far too large to store in main memory. I'm asking
   because
   >> I'm going to use this file in conjunction with mmap to do parallel writes
   >> to this file. Say, I want to create a blank file of 10,000 floating point
   >> numbers.
   >>
   >> Thanks!
   >>
   >> --j
   >>
   >> --
   >> Jonathan A. Greenberg, PhD
   >> Assistant Professor
   >> Department of Geography and Geographic Information Science
   >> University of Illinois at Urbana-Champaign
   >> 607 South Mathews Avenue, MC 150
   >> Urbana, IL 61801
   >> Phone: 415-763-5476
   >> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
   >> [1]http://www.geog.illinois.edu/people/JonathanGreenberg.html
   >>
   >> [[alternative HTML version deleted]]
   >>
   >> _______________________________________________
   >> R-sig-hpc mailing list
   >> R-sig-hpc at r-project.org
   >> [2]https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
   >>
   >>
   >>
   >
   >
   > --
   > Jonathan A. Greenberg, PhD
   > Assistant Professor
   > Department of Geography and Geographic Information Science
   > University of Illinois at Urbana-Champaign
   > 607 South Mathews Avenue, MC 150
   > Urbana, IL 61801
   > Phone: 415-763-5476
   > AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
   > [3]http://www.geog.illinois.edu/people/JonathanGreenberg.html
   >
   --
   Jonathan A. Greenberg, PhD
   Assistant Professor
   Department of Geography and Geographic Information Science
   University of Illinois at Urbana-Champaign
   607 South Mathews Avenue, MC 150
   Urbana, IL 61801
   Phone: 217-300-1924
   AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
   [4]http://www.geog.illinois.edu/people/JonathanGreenberg.html
   [[alternative HTML version deleted]]
   _______________________________________________
   R-sig-hpc mailing list
   R-sig-hpc at r-project.org
   [5]https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

References

   1. http://www.geog.illinois.edu/people/JonathanGreenberg.html
   2. https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
   3. http://www.geog.illinois.edu/people/JonathanGreenberg.html
   4. http://www.geog.illinois.edu/people/JonathanGreenberg.html
   5. https://stat.ethz.ch/mailman/listinfo/r-sig-hpc



More information about the R-help mailing list