[Rd] R-2.15.2 changes in computation speed. Numerical precision?

Fri Dec 14 18:07:09 CET 2012

On 14.12.2012 07:55, Paul Johnson wrote:
> On Thu, Dec 13, 2012 at 3:33 AM, Uwe Ligges
> <ligges at statistik.tu-dortmund.de> wrote:
>> Long message, but as far as I can see, this is not about base R but the
>> contributed package Amelia: Please discuss possible improvements with its
>> maintainer.
>>
> Thanks for answering, but I'm really surprised by your answer. The
> package is a constant, but R got much slower  between R-2.15.1 and
> R-2.15.2. The profile of functions called radically changed, svd gets
> called much more because solve() fails much more often.

I have screened your mail and I have seen statements and code re. 
Amelia. If you are talking about R, please provide reproducible examples 
without overhead of packages. The CRAN check times of > 4000 packages 
are typically a good indicator, and they are a bit slower for R-2.15.2 
but not that it is generally worrying (since we also run more checks).

> No change in R could account for it?  I'm not saying R is wrong, it
> may be more accurate and better. After chasing the slowdown for a
> week, I need some comfort. Does the LINPACK -> LAPACK change play a
> role. The change I'm looking for is something that would substantially
> tune up mathematical precision so that matrices that did not seem
> singular before are now, in the eyes of functions like chol() and
> solve().  Whereas in R-2.15.1, a matrix can be inverted by solve(),
> for example, now R-2.15.2 rejects the matrix as singular.

Then a LAPACK upgrade for bugfix reasons, I believe.

> I will take the problem up with applications, of course. But you see
> how package writers might think its ridiculous.   They argue, "I had a
> perfectly fine calculation against R-2.15.0 and R-2.15.1, and now with
> R-2.15.2 it takes three times as long?  And you want me to revise my
> package?"
>
> Would you be persuaded there's an R base question if I showed you a
> particular matrix that can be decomposed or solved in R-2.15.1 but
> cannot be in R-2.15.2? I should have thought of that before, I suppose
> :)

Yes, a minimal reproducible example would be good, but then it is 
probably a LAPACK issue and we have to convince the LAPACK people to 
improve code.

Best,
uwe

>
> pj
>> Best,
>> Uwe Ligges
>>
>>
>>
>> On 12.12.2012 19:14, Paul Johnson wrote:
>>>
>>> Speaking of optimization and speeding up R calculations...
>>>
>>> I mentioned last week I want to speed up calculation of generalized
>>> inverses. On Debian Wheezy with R-2.15.2, I see a huge speedup using a
>>> souped up generalized inverse algorithm published by
>>>
>>> V. N. Katsikis, D. Pappas, Fast computing of theMoore-Penrose inverse
>>> matrix, Electronic Journal of Linear Algebra,
>>> 17(2008), 637-650.
>>>
>>> I was so delighted to see the computation time drop on my Debian
>>> system that I boasted to the WIndows users and gave them a test case.
>>> They answered back "there's no benefits, plus Windows is faster than
>>> Linux".
>>>
>>> That sent me off on a bit of a goose chase, but I think I'm beginning
>>> to understand the situation.  I believe R-2.15.2 introduced a tighter
>>> requirement for precision, thus triggering longer-lasting calculations
>>> in many example scripts. Better algorithms can avoid some of that
>>> slowdown, as you see in this test case.
>>>
>>> Here is the test code you can run to see:
>>>
>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.R
>>>
>>> It downloads a data file from that same directory and then runs some
>>> multiple imputations with the Amelia package.
>>>
>>> Here's the output from my computer
>>>
>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1.Rout
>>>
>>> That includes the profile of the calculations that depend on the
>>> ordinary generalized inverse algorithm based on svd and the new one.
>>>
>>> See? The KP algorithm is faster.  And just as accurate as
>>> Amelia:::mpinv or MASS::ginv (for details on that, please review my
>>> notes in http://pj.freefaculty.org/scraps/profile/qrginv.R).
>>>
>>> So I asked WIndows users for more detailed feedback, including
>>> sessionInfo(), and I noticed that my proposed algorithm is not faster
>>> on Windows--WITH OLD R!
>>>
>>> Here's the script output with R-2.15.0, shows no speedup from the
>>> KPginv algorithm
>>>
>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-Windows.Rout
>>>
>>> On the same machine, I updated to R-2.15.2, and we see the same
>>> speedup from the KPginv algorithm
>>>
>>>
>>> http://pj.freefaculty.org/scraps/profile/prof-puzzle-1-CRMDA02-WinR2.15.2.Rout
>>>
>>> After that, I realized it is an R version change, not an OS
>>> difference, I was a bit relieved.
>>>
>>> What causes the difference in this case?  In the Amelia code, they try
>>> to avoid doing the generalized inverse by using the ordinary solve(),
>>> and if that fails, then they do the generalized inverse. In R 2.15.0,
>>> the near singularity of the matrix is ignored, but not in R 2.15.2.
>>> The ordinary solve is failing almost all the time, thus triggering the
>>> use of the svd based generalized inverse.  Which is slower.
>>>
>>> The Katsikis and Pappas 2008 algorithm is the fastest one I've found
>>> after translating from Matlab to R.  It is not so universally
>>> applicable as svd based methods, it will fail if there are linearly
>>> dependent columns. However, it does tolerate columns of all zeros,
>>> which seems to be the problem case in the particular application I am
>>> testing.
>>>
>>> I tried very hard to get the newer algorithm described here to go as
>>> fast, but it is way way slower, at least in the implementations I
>>> tried:
>>> ##  KPP
>>> ## Vasilios N. Katsikis, Dimitrios Pappas, Athanassios Petralias.  "An
>>> improved method for
>>> ## the computation of the Moore Penrose inverse matrix," Applied
>>> ## Mathematics and Computation, 2011
>>>
>>> The notes on that are in the qrginv.R file linked above.
>>>
>>> The fact that I can't make that newer KPP algorithm go faster,
>>> although the authors show it can go faster in Matlab, leads me to a
>>> bunch of other questions and possibly the need to implement all of
>>> this in C with LAPACK or EIGEN or something like that, but at this
>>> point, I've got to return to my normal job.  If somebody is good at
>>> R's .Call interface and can make a pure C implementation of KPP.
>>>
>>> I think the key thing is that with R-2.15.2, there is an svd-related
>>> bottleneck in the multiple imputation algorithms in Amelia. The
>>> replacement version of the function Amelia:::mpinv does reclaim a 30%
>>> time saving, while generating imputations that are identical, so far
>>> as i can tell.
>>>
>>> pj
>>>
>>>
>>>
>>
>
>
>