[Rd] stats::reshape quadratic in number of input columns
Toby Hocking
tdhock5 @end|ng |rom gm@||@com
Tue Oct 29 01:17:32 CET 2019
Hi R-core,
I have been performance testing R packages for wide-to-tall data reshaping
and for the most part I see they differ by constant factors.
However in one test, which involves converting into multiple output
columns, I see that stats::reshape is in fact quadratic in the number of
input columns. For example take the iris data, which has 4 input columns to
reshape, and the desired output has columns named
Species,Sepal,Petal,dimension (where part is either Length or Width). Of
course there is no performance issue with N=4 input columns in the original
iris data, but I made larger versions of this reshaping problem by making
copies of the input columns. The results
https://github.com/tdhock/nc-article#28-oct-2019 show that the quadratic
time complexity results in significant slowdowns after about N=10,000 input
columns to reshape. (e.g. several minutes for stats::reshape versus several
seconds for data.table::melt)
For a fix, I would suggest looking into how they implemented the same
operation in the data.table package, which in my test shows computation
times that seem to be linear.
Toby
[[alternative HTML version deleted]]
More information about the R-devel
mailing list