[Rd] write.csv

Joris Meys jorismeys at gmail.com
Tue Jul 4 15:49:19 CEST 2017


I tested myself, and the "reason" why write.csv() is not giving any error,
is because a file is created. I tested the following with a USB stick
containing only 32Mb of free space:

write.csv(data.frame(V=rnorm(2e7),
                     V2= rnorm(2e7),
                     V3 = rnorm(2e7)),
          file = "G:/Test.csv")

X <- read.csv("G:/Test.csv")

Gives:

> str(X)
'data.frame':    506336 obs. of  4 variables:
 $ X : int  1 2 3 4 5 6 7 8 9 10 ...
 $ V : num  0.0666 -1.2052 -0.2288 -0.4758 1.9168 ...
 $ V2: num  -0.304 -1.766 -1.611 -0.221 -1.118 ...
 $ V3: num  -0.6774 0.0841 0.2062 1.7053 -0.2105 ...

So the first part of the data is stored actually. I totally agree that at
least a warning could be given to tell you not all lines are saved.

While Duncan's reaction might come off a bit direct, please understand that
they are not employees but volunteers. You can demand things from a
company, but in the case of R that's actually rather rude, even when not
intended that way.

Given my limited C skills and my wife hating it when I'm solving other
people's problems in the middle of the night, I'm not hacking in the R core
myself. But as for now, I can offer you this very naive and for big
datasets very time consuming function to check beforehand whether you have
enough space:

testSpace <- function(df,dir){
   totchar <- do.call(sum,
                      lapply(df,
                             function(i) sum(nchar(as.character(i)))))
   # On Windows!
   path <- path.expand(dir)
   path <- gsub("(^[A-Z]{1}:)/.*","\\1",path)

   disks <- system("wmic logicaldisk get freespace, caption",
                   inter = TRUE)

   available <- disks[grep(path,disks)]
   available <- gsub("\\D","",available)
   # Assume 2 bytes per char in UTF-8, which is very liberal
   # but not uncommon
   totchar*16 < as.numeric(available)
}

Gives after about half a minute:

> mydf <- data.frame(V=rnorm(1e7))
> testSpace(mydf, "G:/text.csv")
[1] FALSE

Best regards
Joris

On Tue, Jul 4, 2017 at 2:40 PM, Lipatz Jean-Luc <jean-luc.lipatz at insee.fr>
wrote:

> I would really like the bug fixed. At least this one, because I know
> people in my institute using this function.
> I understand your arguments about open source, but I also saw in this mail
> list a proposal for a fix for this bug for which there were no answer from
> the people who are able to include it in the distribution. It looks like if
> there were interesting bugs and the other ones.
> I don't understand the other arguments : the example was reproduced with a
> simple USB key and you cannot state that a disk will eternally be empty
> enough, specially when it has several users.
>
> JLL
>
>
> -----Message d'origine-----
> De : Duncan Murdoch [mailto:murdoch.duncan at gmail.com]
> Envoyé : mardi 4 juillet 2017 14:24
> À : Lipatz Jean-Luc; r-devel at r-project.org
> Objet : Re: [Rd] write.csv
>
> On 04/07/2017 5:40 AM, Lipatz Jean-Luc wrote:
> > Hi all,
> >
> > I am currently studying how to generalize the usage of R in my
> statistical institute and I encountered a problem that I cannot declare on
> bugzilla (cannot understand why).
>
> Bugzilla was badly abused by spammers last year, so you need to have your
> account created manually by one of the admins to post there.  Write to me
> privately if you'd like me to create an account for you.  (If you want it
> attached to a different email address, that's fine.)
>
> Sorry for trying this mailing list but I am really worried about the
> problem itself and the possible implications in using R in a professionnal
> data production context.
> > The issue about 'write.csv' is that it just doesn't check if there is
> enough space on disk and doesn't report failure to write data.
> >
> > Example (R 3.4.0 windows 32 bits, but I reproduced the problem with
> older versions and under Mac OS/X)
> >
> >> fwrite(as.list(1:1000000),"G:/Test")
> > Error in fwrite(as.list(1:1e+06), "G:/Test") :
> >   No space left on device: 'G:/Test'
> >> write.csv(1:1000000,"G:/Test")
> >>
> >
> > I have a big concern here, because it means that you could save some
> important data at one point of time and discover a long time after that you
> actually lost them.
>  > I suppose that the fix is relatively straightforward, but how can we
> be sure that there is no another function with the same bad properties?
>
> R is open source.  You could work out the patch for this bug, and in the
> process see the pattern of coding that leads to it.  Then you'll know if
> other functions use the same buggy pattern.
>
> > Is the lesson that you should not use a R function, even from the core,
> without having personnally tested it against extreme conditions?
>
> I think the answer to that is yes.  Most people never write such big
> files that they fill their disk:  if they did, all sorts of things would
> go wrong on their systems.  So this kind of extreme condition isn't
> often tested.  It's not easy to test in a platform independent way:  R
> would need to be able to create a volume with a small capacity.  That's
> a very system-dependent thing to do.
>
> > And wouldn't it be the work of the developpers to do such elementary
> tests?
>
> Again, R is open source.  You can and should contribute code (and
> therefore become one of the developers) if you are working in unusual
> conditions.
>
> R states quite clearly in the welcome message every time it starts: "R
> is free software and comes with ABSOLUTELY NO WARRANTY."  This is
> essentially the same lack of warranty that you get with commercial
> software, though it's stated a lot more clearly.
>
> Duncan Murdoch
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]



More information about the R-devel mailing list