[Rd] write.csv
Joris Meys
jorismeys at gmail.com
Tue Jul 4 15:49:19 CEST 2017
I tested myself, and the "reason" why write.csv() is not giving any error,
is because a file is created. I tested the following with a USB stick
containing only 32Mb of free space:
write.csv(data.frame(V=rnorm(2e7),
V2= rnorm(2e7),
V3 = rnorm(2e7)),
file = "G:/Test.csv")
X <- read.csv("G:/Test.csv")
Gives:
> str(X)
'data.frame': 506336 obs. of 4 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ V : num 0.0666 -1.2052 -0.2288 -0.4758 1.9168 ...
$ V2: num -0.304 -1.766 -1.611 -0.221 -1.118 ...
$ V3: num -0.6774 0.0841 0.2062 1.7053 -0.2105 ...
So the first part of the data is stored actually. I totally agree that at
least a warning could be given to tell you not all lines are saved.
While Duncan's reaction might come off a bit direct, please understand that
they are not employees but volunteers. You can demand things from a
company, but in the case of R that's actually rather rude, even when not
intended that way.
Given my limited C skills and my wife hating it when I'm solving other
people's problems in the middle of the night, I'm not hacking in the R core
myself. But as for now, I can offer you this very naive and for big
datasets very time consuming function to check beforehand whether you have
enough space:
testSpace <- function(df,dir){
totchar <- do.call(sum,
lapply(df,
function(i) sum(nchar(as.character(i)))))
# On Windows!
path <- path.expand(dir)
path <- gsub("(^[A-Z]{1}:)/.*","\\1",path)
disks <- system("wmic logicaldisk get freespace, caption",
inter = TRUE)
available <- disks[grep(path,disks)]
available <- gsub("\\D","",available)
# Assume 2 bytes per char in UTF-8, which is very liberal
# but not uncommon
totchar*16 < as.numeric(available)
}
Gives after about half a minute:
> mydf <- data.frame(V=rnorm(1e7))
> testSpace(mydf, "G:/text.csv")
[1] FALSE
Best regards
Joris
On Tue, Jul 4, 2017 at 2:40 PM, Lipatz Jean-Luc <jean-luc.lipatz at insee.fr>
wrote:
> I would really like the bug fixed. At least this one, because I know
> people in my institute using this function.
> I understand your arguments about open source, but I also saw in this mail
> list a proposal for a fix for this bug for which there were no answer from
> the people who are able to include it in the distribution. It looks like if
> there were interesting bugs and the other ones.
> I don't understand the other arguments : the example was reproduced with a
> simple USB key and you cannot state that a disk will eternally be empty
> enough, specially when it has several users.
>
> JLL
>
>
> -----Message d'origine-----
> De : Duncan Murdoch [mailto:murdoch.duncan at gmail.com]
> Envoyé : mardi 4 juillet 2017 14:24
> À : Lipatz Jean-Luc; r-devel at r-project.org
> Objet : Re: [Rd] write.csv
>
> On 04/07/2017 5:40 AM, Lipatz Jean-Luc wrote:
> > Hi all,
> >
> > I am currently studying how to generalize the usage of R in my
> statistical institute and I encountered a problem that I cannot declare on
> bugzilla (cannot understand why).
>
> Bugzilla was badly abused by spammers last year, so you need to have your
> account created manually by one of the admins to post there. Write to me
> privately if you'd like me to create an account for you. (If you want it
> attached to a different email address, that's fine.)
>
> Sorry for trying this mailing list but I am really worried about the
> problem itself and the possible implications in using R in a professionnal
> data production context.
> > The issue about 'write.csv' is that it just doesn't check if there is
> enough space on disk and doesn't report failure to write data.
> >
> > Example (R 3.4.0 windows 32 bits, but I reproduced the problem with
> older versions and under Mac OS/X)
> >
> >> fwrite(as.list(1:1000000),"G:/Test")
> > Error in fwrite(as.list(1:1e+06), "G:/Test") :
> > No space left on device: 'G:/Test'
> >> write.csv(1:1000000,"G:/Test")
> >>
> >
> > I have a big concern here, because it means that you could save some
> important data at one point of time and discover a long time after that you
> actually lost them.
> > I suppose that the fix is relatively straightforward, but how can we
> be sure that there is no another function with the same bad properties?
>
> R is open source. You could work out the patch for this bug, and in the
> process see the pattern of coding that leads to it. Then you'll know if
> other functions use the same buggy pattern.
>
> > Is the lesson that you should not use a R function, even from the core,
> without having personnally tested it against extreme conditions?
>
> I think the answer to that is yes. Most people never write such big
> files that they fill their disk: if they did, all sorts of things would
> go wrong on their systems. So this kind of extreme condition isn't
> often tested. It's not easy to test in a platform independent way: R
> would need to be able to create a volume with a small capacity. That's
> a very system-dependent thing to do.
>
> > And wouldn't it be the work of the developpers to do such elementary
> tests?
>
> Again, R is open source. You can and should contribute code (and
> therefore become one of the developers) if you are working in unusual
> conditions.
>
> R states quite clearly in the welcome message every time it starts: "R
> is free software and comes with ABSOLUTELY NO WARRANTY." This is
> essentially the same lack of warranty that you get with commercial
> software, though it's stated a lot more clearly.
>
> Duncan Murdoch
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics
tel : +32 (0)9 264 61 79
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
[[alternative HTML version deleted]]
More information about the R-devel
mailing list