[R] ff package: ff objects don't reload completely on NFS drives from a different machine

Henrik Bengtsson hb at stat.berkeley.edu
Sun Jan 24 01:25:49 CET 2010


Hi,

this could be due to how NFS works.  Note that there can be up to a 30
second delay before other hosts on the same file system see the
updates that was flushed by one machine.  You basically cannot treat
files on an shared NFS file system as if you are working on a single
machine.  You have to add some higher protection if your data sources
should be shared ...and that is not an easy problem if you want it to
be bullet proof.  You need to use a semaphore/mutex or other ways to
communicate when files are updated/flushed/read etc.  I'm still
looking for a such a mechanism done over a file system that is bullet
proof (without having to relying on a central server).

My $.02

/Henrik

On Sat, Jan 23, 2010 at 12:02 PM, Hao Cen <hcen at andrew.cmu.edu> wrote:
> Hi ff users and Jens,
>
> I am using the ff package and it has been working great. Recently I
> noticed an unexpected behavior in the ff package --  when I save an ff
> matrix on one machine to an NFS drive and load it on another machine from
> the save NFS drive,  I got quote a lot of zeros in the matrix. The
> following code reproduces the error
>
> mat = matrix(1:25, 5)
> matFF = ff(mat, dim=dim(mat),  dimnames = dimnames(mat),
>                dimorder = c(2,1),
>                filename=  "~/m.ff", overwrite=TRUE)
> save(matFF, file = "~/mat.ff.rda")
> load(file = "~/mat.ff.rda")
> open(matFF)
> matFF
>
> If I execute all the six lines at one machine. Everything works fine.
> However, when I only execute the last three line at another machine, I got
>
>> matFF
> ff (open) integer length=25 (25) dim=c(5,5) dimorder=c(2,1)
>     [,1] [,2] [,3] [,4] [,5]
> [1,]    0    0    0    0    0
> [2,]    0    0    0    0    0
> [3,]    0    0    0    0    0
> [4,]    0    0    0    0    0
> [5,]    0    0    0    0    0
>
> If the matrix is larger, say mat = matrix(1:20000, 5), I would get the
> following -- dozens of zeros at the end.
>  ff (open) integer length=20000 (20000) dim=c(5,4000) dimorder=c(2,1)
>      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]   [,3993] [,3994]
> [,3995] [,3996] [,3997] [,3998] [,3999] [,4000]
> [1,]     1     6    11    16    21    26    31    36 :   19961   19966
> 19971   19976   19981   19986   19991   19996
> [2,]     2     7    12    17    22    27    32    37 :   19962   19967
> 19972   19977   19982   19987   19992   19997
> [3,]     3     8    13    18    23    28    33    38 :   19963   19968
> 19973   19978   19983   19988   19993   19998
> [4,]     4     9    14    19    24    29    34    39 :   19964   19969
> 19974   19979   19984   19989   19994   19999
> [5,]     5    10    15    20    25    30    35    40 :       0       0
>  0       0       0       0       0       0
>
> I tried set caching =  "mmeachflush" in the ff function but it doesn't
> help.  My computing enrionment is linux 64 bit, R 2.10, ff 2.1.
>
> If you know what causes the issue or how to solve it, please let me know.
> I highly appreciate.
>
> Jeff
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list