[R] Reading File Sizes: very slow!
Leonard Mada
|eo@m@d@ @end|ng |rom @yon|c@eu
Sat Sep 25 17:11:59 CEST 2021
Dear List Members,
I tried to compute the file sizes of each installed package and the
process is terribly slow.
It took ~ 10 minutes for 512 packages / 1.6 GB total size of files.
1.) Package Sizes
system.time({
x = size.pkg(file=NULL);
})
# elapsed time: 509 s !!!
# 512 Packages; 1.64 GB;
# R 4.1.1 on MS Windows 10
The code for the size.pkg() function is below and the latest version is
on Github:
https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R
Questions:
Is there a way to get the file size faster?
It takes long on Windows as well, but of the order of 10-20 s, not 10
minutes.
Do I miss something?
1.b.) Alternative
It came to my mind to read first all file sizes and then use tapply or
aggregate - but I do not see why it should be faster.
Would it be meaningful to benchmark each individual package?
Although I am not very inclined to wait 10 minutes for each new try out.
2.) Big Packages
Just as a note: there are a few very large packages (in my list of 512
packages):
1 123,566,287 BH
2 113,578,391 sf
3 112,252,652 rgdal
4 81,144,868 magick
5 77,791,374 openNLPmodels.en
I suspect that sf & rgdal have a lot of duplicated data structures
and/or duplicate code and/or duplicated libraries - although I am not an
expert in the field and did not check the sources.
Sincerely,
Leonard
=======
# Package Size:
size.f.pkg = function(path=NULL) {
if(is.null(path)) path = R.home("library");
xd = list.dirs(path = path, full.names = FALSE, recursive = FALSE);
size.f = function(p) {
p = paste0(path, "/", p);
sum(file.info(list.files(path=p, pattern=".",
full.names = TRUE, all.files = TRUE, recursive = TRUE))$size);
}
sapply(xd, size.f);
}
size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") {
x = size.f.pkg(path=path);
x = as.data.frame(x);
names(x) = "Size"
x$Name = rownames(x);
# Order
if(sort) {
id = order(x$Size, decreasing=TRUE)
x = x[id,];
}
if( ! is.null(file)) {
if( ! is.character(file)) {
print("Error: Size NOT written to file!");
} else write.csv(x, file=file, row.names=FALSE);
}
return(x);
}
More information about the R-help
mailing list