[R] Exceptional slowness with read.csv
Stevie Pederson
@tephen@peder@on@@u @end|ng |rom gm@||@com
Mon Apr 8 17:18:46 CEST 2024
Hi Dave,
That's rather frustrating. I've found vroom (from the package vroom) to be
helpful with large files like this.
Does the following give you any better luck?
vroom(file_name, delim = ",", skip = 2459465, n_max = 5)
Of course, when you know you've got errors & the files are big like that it
can take a bit of work resolving things. The command line tools awk & sed
might even be a good plan for finding lines that have errors & figuring out
a fix, but I certainly don't envy you.
All the best
Stevie
On Tue, 9 Apr 2024 at 00:36, Dave Dixon <ddixon using swcp.com> wrote:
> Greetings,
>
> I have a csv file of 76 fields and about 4 million records. I know that
> some of the records have errors - unmatched quotes, specifically.
> Reading the file with readLines and parsing the lines with read.csv(text
> = ...) is really slow. I know that the first 2459465 records are good.
> So I try this:
>
> > startTime <- Sys.time()
> > first_records <- read.csv(file_name, nrows = 2459465)
> > endTime <- Sys.time()
> > cat("elapsed time = ", endTime - startTime, "\n")
>
> elapsed time = 24.12598
>
> > startTime <- Sys.time()
> > second_records <- read.csv(file_name, skip = 2459465, nrows = 5)
> > endTime <- Sys.time()
> > cat("elapsed time = ", endTime - startTime, "\n")
>
> This appears to never finish. I have been waiting over 20 minutes.
>
> So why would (skip = 2459465, nrows = 5) take orders of magnitude longer
> than (nrows = 2459465) ?
>
> Thanks!
>
> -dave
>
> PS: readLines(n=2459470) takes 10.42731 seconds.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list