[R-pkg-devel] How to decrease time to import files in xlsx format?

Igor L |gor|@|tu| @end|ng |rom gm@||@com
Tue Oct 4 20:29:54 CEST 2022


Hello all,

I'm developing an R package that basically downloads, imports, cleans and
merges nine files in xlsx format updated monthly from a public institution.

The problem is that importing files in xlsx format is time consuming.

My initial idea was to parallelize the execution of the read_xlsx function
according to the number of cores in the user's processor, but apparently it
didn't make much difference, since when trying to parallelize it the
execution time went from 185.89 to 184.12 seconds:

# not parallelized code
y <- purrr::map_dfr(paste0(dir.temp, '/', lista.arquivos.locais),
               readxl::read_excel, sheet = 1, skip = 4, col_types =
c(rep('text', 30)))

# parallelized code
plan(strategy = future::multicore(workers = 4))
y <- furrr::future_map_dfr(paste0(dir.temp, '/', lista.arquivos.locais),
                             readxl::read_excel, sheet = 1, skip = 4,
col_types = c(rep('text', 30)))

 Any suggestions to reduce the import processing time?

Thanks in advance!

-- 
*Igor Laltuf Marques*
Economist (UFF)
Master in Urban and Regional Planning (IPPUR-UFRJ)
Researcher at ETTERN and CiDMob
https://igorlaltuf.github.io/

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list