[R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?
jens.oehlschlaegel at truecluster.com
jens.oehlschlaegel at truecluster.com
Fri Sep 28 15:36:21 CEST 2012
Jonathan,
ff has a utility function file.resize() which allows to give a new filesize
in bytes using doubles.
See ?file.resize
Regards
Jens Oehlschlägel
Gesendet: Donnerstag, 27. September 2012 um 21:17 Uhr
Von: "Jonathan Greenberg" <jgrn at illinois.edu>
An: r-help <r-help at r-project.org>, r-sig-hpc at r-project.org
Betreff: Re: [R-sig-hpc] Quickest way to make a large "empty" file on disk?
Folks:
Asked this question some time ago, and found what appeared (at first) to be
the best solution, but I'm now finding a new problem. First off, it seemed
like ff as Jens suggested worked:
# outdata_ncells = the number of rows * number of columns * number of bands
in an image:
out<-ff(vmode="double",length=outdata_ncells,filename=filename)
finalizer(out) <- close
close(out)
This was working fine until I attempted to set length to a VERY large
number: outdata_ncells = 17711913600. This would create a file that is
131.964GB. Big, but not obscenely so (and certainly not larger than the
filesystem can handle). However, length appears to be restricted
by .Machine$integer.max (I'm on a 64-bit windows box):
> .Machine$integer.max
[1] 2147483647
Any suggestions on how to solve this problem for much larger file sizes?
--j
On Thu, May 3, 2012 at 10:44 AM, Jonathan Greenberg
<jgrn at illinois.edu>wrote:
> Thanks, all! I'll try these out. I'm trying to work up something that is
> platform independent (if possible) for use with mmap. I'll do some tests
> on these suggestions and see which works best. I'll try to report back in
a
> few days. Cheers!
>
> --j
>
>
>
> 2012/5/3 "Jens Oehlschlägel" <jens.oehlschlaegel at truecluster.com>
>
>> Jonathan,
>>
>> On some filesystems (e.g. NTFS, see below) it is possible to create
>> 'sparse' memory-mapped files, i.e. reserving the space without the cost
of
>> actually writing initial values.
>> Package 'ff' does this automatically and also allows to access the file
>> in parallel. Check the example below and see how big file creation is
>> immediate.
>>
>> Jens Oehlschlägel
>>
>>
>> > library(ff)
>> > library(snowfall)
>> > ncpus <- 2
>> > n <- 1e8
>> > system.time(
>> + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
>> + )
>> User System verstrichen
>> 0.01 0.00 0.02
>> > # check finalizer, with an explicit filename we should have a 'close'
>> finalizer
>> > finalizer(x)
>> [1] "close"
>> > # if not, set it to 'close' inorder to not let slaves delete x on slave
>> shutdown
>> > finalizer(x) <- "close"
>> > sfInit(parallel=TRUE, cpus=ncpus, type="SOCK")
>> R Version: R version 2.15.0 (2012-03-30)
>>
>> snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2
>> CPUs.
>>
>> > sfLibrary(ff)
>> Library ff loaded.
>> Library ff loaded in cluster.
>>
>> Warnmeldung:
>> In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts
>> = TRUE, :
>> 'keep.source' is deprecated and will be ignored
>> > sfExport("x") # note: do not export the same ff multiple times
>> > # explicitely opening avoids a gc problem
>> > sfClusterEval(open(x, caching="mmeachflush")) # opening with
>> 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS
>> write storms when the file is larger than RAM
>> [[1]]
>> [1] TRUE
>>
>> [[2]]
>> [1] TRUE
>>
>> > system.time(
>> + sfLapply( chunk(x, length=ncpus), function(i){
>> + x[i] <- runif(sum(i))
>> + invisible()
>> + })
>> + )
>> User System verstrichen
>> 0.00 0.00 30.78
>> > system.time(
>> + s <- sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i],
>> c(0.05, 0.95)) )
>> + )
>> User System verstrichen
>> 0.00 0.00 4.38
>> > # for completeness
>> > sfClusterEval(close(x))
>> [[1]]
>> [1] TRUE
>>
>> [[2]]
>> [1] TRUE
>>
>> > csummary(s)
>> 5% 95%
>> Min. 0.04998 0.95
>> 1st Qu. 0.04999 0.95
>> Median 0.05001 0.95
>> Mean 0.05001 0.95
>> 3rd Qu. 0.05002 0.95
>> Max. 0.05003 0.95
>> > # stop slaves
>> > sfStop()
>>
>> Stopping cluster
>>
>> > # with the close finalizer we are responsible for deleting the file
>> explicitely (unless we want to keep it)
>> > delete(x)
>> [1] TRUE
>> > # remove r-side metadata
>> > rm(x)
>> > # truly free memory
>> > gc()
>>
>>
>>
>> *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr
>> *Von:* "Jonathan Greenberg" <jgrn at illinois.edu>
>> *An:* r-help <r-help at r-project.org>, r-sig-hpc at r-project.org
>> *Betreff:* [R-sig-hpc] Quickest way to make a large "empty" file on
>> disk?
>> R-helpers:
>>
>> What would be the absolute fastest way to make a large "empty" file (e.g.
>> filled with all zeroes) on disk, given a byte size and a given number
>> number of empty values. I know I can use writeBin, but the "object" in
>> this case may be far too large to store in main memory. I'm asking
because
>> I'm going to use this file in conjunction with mmap to do parallel writes
>> to this file. Say, I want to create a blank file of 10,000 floating point
>> numbers.
>>
>> Thanks!
>>
>> --j
>>
>> --
>> Jonathan A. Greenberg, PhD
>> Assistant Professor
>> Department of Geography and Geographic Information Science
>> University of Illinois at Urbana-Champaign
>> 607 South Mathews Avenue, MC 150
>> Urbana, IL 61801
>> Phone: 415-763-5476
>> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
>> [1]http://www.geog.illinois.edu/people/JonathanGreenberg.html
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> [2]https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>>
>>
>
>
> --
> Jonathan A. Greenberg, PhD
> Assistant Professor
> Department of Geography and Geographic Information Science
> University of Illinois at Urbana-Champaign
> 607 South Mathews Avenue, MC 150
> Urbana, IL 61801
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
> [3]http://www.geog.illinois.edu/people/JonathanGreenberg.html
>
--
Jonathan A. Greenberg, PhD
Assistant Professor
Department of Geography and Geographic Information Science
University of Illinois at Urbana-Champaign
607 South Mathews Avenue, MC 150
Urbana, IL 61801
Phone: 217-300-1924
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
[4]http://www.geog.illinois.edu/people/JonathanGreenberg.html
[[alternative HTML version deleted]]
_______________________________________________
R-sig-hpc mailing list
R-sig-hpc at r-project.org
[5]https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
References
1. http://www.geog.illinois.edu/people/JonathanGreenberg.html
2. https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
3. http://www.geog.illinois.edu/people/JonathanGreenberg.html
4. http://www.geog.illinois.edu/people/JonathanGreenberg.html
5. https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
More information about the R-help
mailing list