Quite right. I just rebuilt R linked against the latest version of LAPACK and all the problems went away. From: Pierre Lapointe [mailto:pierrelap@gmail.com] Sent: Thursday, June 28, 2012 12:08 PM To: Patrick Burns Cc: r-sig-finance@r-project.org; McGehee, Robert Subject: Re: [R-SIG-Finance] Incorrect SVD Calculation According to Peter Dalgard: The issue goes away when relinking against LAPACK 3.4.1. R's generic LAPACK is extracted from version 3.1.1 and could be upgraded with some effort (it isn't a straightforward copy). https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14962#c2 Good catch Robert! Pierre Lapointe Brockhouse Cooper | Global Macro Strategist 514-227-7711 | plapointe@brockhousecooper.com On Thu, Jun 28, 2012 at 11:58 AM, Patrick Burns > wrote: Thanks for highlighting that there is risk where I hadn't expected any. I've seen a different weird phenomenon in which R's 'svd' throws an error because of non-convergence in the algorithm. Changing the order of the rows makes it go away. Pat On 28/06/2012 16:52, Robert Harlow wrote: Robert, First off, this is a pretty interesting phenomenon, and let me start off by saying that I have not figured out what is going on. However, I do have some more information to contribute: I am unable to recreate the bug in MATLAB, which uses a different subroutine from LAPACK: DGESVD whereas R uses DGESDD I then tried to see if I exported the same random matrix and tried to run it through the MATLAB svd function what would happen. However, if I write the offending matrix to a csv file, then load it back into R and calculate the svd over again, it works and we get sum(sv) ~ 600. I know this doesn't help, but I guess its more information.. I'll let you know if I find anything else, -Bob for (i in 1:10) { set.seed(i) R <- matrix(rnorm(x1*x2), x1, x2) ee <- eigen(tcrossprod(R)) ## 'cx' is a positive semi-definite matrix with eigenvalues of 'ev' D <- ee$vectors cx <- ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2]) ## Scale 'cx' to the corresponding correlation matrix and verify diag=1 cx <- cov2cor(cx) stopifnot(all(diag(cx)==1)) ## Calculate sum of singular values (should be 'x1') sv <- sum(svd(cx)$d) print(abs(sv - x1)) if (abs(sv-x1) > 1e-2) { write.table(R, "svd.csv", sep = ",", row.names = FALSE, col.names = FALSE) x.R <- read.table("svd.csv", sep = ",") x.R <- as.matrix(x.R) R.old <- R R <- x.R ee <- eigen(tcrossprod(R)) ## 'cx' is a positive semi-definite matrix with eigenvalues of 'ev' D <- ee$vectors cx <- ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2]) ## Scale 'cx' to the corresponding correlation matrix and verify diag=1 cx <- cov2cor(cx) stopifnot(all(diag(cx)==1)) ## Calculate sum of singular values (should be 'x1') sv <- sum(svd(cx)$d) browser() cat("Seed=", i, "\n") stop("Average singular value of correlation matrix is ", round(sv, 3), ". Should be ", x1) } } On Wed, Jun 27, 2012 at 6:31 PM, McGehee, Robert < Robert.McGehee@geodecapital.com> wrote: R-Financiers, For any of you who, like us, use SVD for risk modeling and stat arb trading, we've discovered what I think is a very serious bug in R/LAPACK's SVD calculation. Since this bug has the potential to create bogus risk models, I thought this audience might be interested in this post. Specifically, for many matrices that are not full rank and have a few small eigenvalues (much like covariance from stock returns!), SVD may return completely bogus and impossible results: for a correlation matrix the sum of the singular values will be many times too large and the 'u' and 'v' matrix will not be even close to orthonormal. For a random matrix with certain distributions of eigenvalues (first discovered using a covariance from 1-minute bar stock returns), SVD produces bogus results 20%-50% of the time. I've posted the bug report with an example here: https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14962 For any of you using risk models or stat arb that use SVD, you may want to see if this error affects you. Also, (selfishly) I wanted to post this out to fellow practitioners already knew about this problem and were using a better algorithm (I noticed there are more than one LAPACK svd algorithm). The slow method of SVD calculation by eigen(crossprod(x)) and eigen(tcrossprod(x)) still works fine (but it's very slow). Last, here's a simple 'svd' replacement function that performs a rudimentary check on the output of svd. It fails if 'u' is not orthonormal. Generally 'u', 'd', and 'v' all seem to give wrong answers at the same time, but no guarantees that problems with 'v' and 'd' won't slip through. svd <- function(...) { x <- base::svd(...) if (!is.complex(x$u) && any(abs(colSums(x$u^2)-1) > 1e-3)) stop("svd gave incorrect result") x } Suggestions, comments appreciated. --Robert Robert McGehee, CFA Geode Capital Management, LLC One Post Office Square, 28th Floor | Boston, MA | 02109 Direct: (617)392-8396 This e-mail, and any attachments hereto, are intended fo...{{dropped:11}} _______________________________________________ R-SIG-Finance@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. [[alternative HTML version deleted]] _______________________________________________ R-SIG-Finance@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. -- Patrick Burns patrick@burns-stat.com http://www.burns-stat.com http://www.portfolioprobe.com/blog twitter: @portfolioprobe _______________________________________________ R-SIG-Finance@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. [[alternative HTML version deleted]]