[R-SIG-Finance] Incorrect SVD Calculation
Patrick Burns
patrick at burns-stat.com
Thu Jun 28 17:58:41 CEST 2012
Thanks for highlighting that there is risk where
I hadn't expected any.
I've seen a different weird phenomenon in which
R's 'svd' throws an error because of non-convergence
in the algorithm. Changing the order of the rows
makes it go away.
Pat
On 28/06/2012 16:52, Robert Harlow wrote:
> Robert,
> First off, this is a pretty interesting phenomenon, and let me start
> off by saying that I have not figured out what is going on. However, I do
> have some more information to contribute:
> I am unable to recreate the bug in MATLAB, which uses a different
> subroutine from LAPACK: DGESVD whereas R uses DGESDD
>
> I then tried to see if I exported the same random matrix and tried to run
> it through the MATLAB svd function what would happen. However, if I write
> the offending matrix to a csv file, then load it back into R and calculate
> the svd over again, it works and we get sum(sv) ~ 600.
>
> I know this doesn't help, but I guess its more information..
> I'll let you know if I find anything else,
> -Bob
>
> for (i in 1:10) {
> set.seed(i)
> R <- matrix(rnorm(x1*x2), x1, x2)
> ee <- eigen(tcrossprod(R))
>
> ## 'cx' is a positive semi-definite matrix with eigenvalues of 'ev'
> D <- ee$vectors
> cx <- ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])
>
> ## Scale 'cx' to the corresponding correlation matrix and verify diag=1
> cx <- cov2cor(cx)
> stopifnot(all(diag(cx)==1))
>
> ## Calculate sum of singular values (should be 'x1')
> sv <- sum(svd(cx)$d)
> print(abs(sv - x1))
> if (abs(sv-x1) > 1e-2) {
> write.table(R, "svd.csv", sep = ",", row.names = FALSE, col.names =
> FALSE)
> x.R <- read.table("svd.csv", sep = ",")
> x.R <- as.matrix(x.R)
> R.old <- R
> R <- x.R
> ee <- eigen(tcrossprod(R))
> ## 'cx' is a positive semi-definite matrix with eigenvalues of 'ev'
> D <- ee$vectors
> cx <- ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])
>
> ## Scale 'cx' to the corresponding correlation matrix and verify
> diag=1
> cx <- cov2cor(cx)
> stopifnot(all(diag(cx)==1))
>
> ## Calculate sum of singular values (should be 'x1')
> sv <- sum(svd(cx)$d)
> browser()
> cat("Seed=", i, "\n")
> stop("Average singular value of correlation matrix is ", round(sv,
> 3),
> ". Should be ", x1)
> }
> }
>
>
>
> On Wed, Jun 27, 2012 at 6:31 PM, McGehee, Robert <
> Robert.McGehee at geodecapital.com> wrote:
>
>> R-Financiers,
>> For any of you who, like us, use SVD for risk modeling and stat arb
>> trading, we've discovered what I think is a very serious bug in R/LAPACK's
>> SVD calculation. Since this bug has the potential to create bogus risk
>> models, I thought this audience might be interested in this post.
>>
>> Specifically, for many matrices that are not full rank and have a few
>> small eigenvalues (much like covariance from stock returns!), SVD may
>> return completely bogus and impossible results: for a correlation matrix
>> the sum of the singular values will be many times too large and the 'u' and
>> 'v' matrix will not be even close to orthonormal. For a random matrix with
>> certain distributions of eigenvalues (first discovered using a covariance
>> from 1-minute bar stock returns), SVD produces bogus results 20%-50% of the
>> time.
>>
>> I've posted the bug report with an example here:
>> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14962
>>
>> For any of you using risk models or stat arb that use SVD, you may want to
>> see if this error affects you.
>>
>> Also, (selfishly) I wanted to post this out to fellow practitioners
>> already knew about this problem and were using a better algorithm (I
>> noticed there are more than one LAPACK svd algorithm). The slow method of
>> SVD calculation by eigen(crossprod(x)) and eigen(tcrossprod(x)) still works
>> fine (but it's very slow).
>>
>> Last, here's a simple 'svd' replacement function that performs a
>> rudimentary check on the output of svd. It fails if 'u' is not orthonormal.
>> Generally 'u', 'd', and 'v' all seem to give wrong answers at the same
>> time, but no guarantees that problems with 'v' and 'd' won't slip through.
>>
>> svd <- function(...) {
>> x <- base::svd(...)
>> if (!is.complex(x$u) && any(abs(colSums(x$u^2)-1) > 1e-3))
>> stop("svd gave incorrect result")
>> x
>> }
>>
>> Suggestions, comments appreciated.
>> --Robert
>>
>> Robert McGehee, CFA
>> Geode Capital Management, LLC
>> One Post Office Square, 28th Floor | Boston, MA | 02109
>> Direct: (617)392-8396
>>
>> This e-mail, and any attachments hereto, are intended fo...{{dropped:11}}
>>
>> _______________________________________________
>> R-SIG-Finance at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-SIG-Finance at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.
>
--
Patrick Burns
patrick at burns-stat.com
http://www.burns-stat.com
http://www.portfolioprobe.com/blog
twitter: @portfolioprobe
More information about the R-SIG-Finance
mailing list