[R-SIG-Finance] Implementing Sharpe's style analysis with solve.QP

Tue Feb 26 01:34:55 CET 2008

I've recently become interested in implementing style analysis as described by 
William Sharpe in:
  http://www.stanford.edu/~wfsharpe/art/sa/sa.htm

In particular, I want to implement the 'constrained' method he discusses using 
the quadratic programming approach he advocates.  I thought that this might 
be easy to do in R using the function 'solve.QP' in the package 'quadprog'.  
Easy, of course, is a relative term.  I've found the documentation somewhat 
oblique, and Googling hasn't produced much in the way of help.  

So I thought I would offer what I've been able to figure out and try to get 
some feedback.  What I have here *seems* to work, although I haven't had to 
specify this kind of a problem myself in, well, decades.

According to Sharpe (1994) what we're trying to do is minimize the variance of 
the difference between the observed returns and the modeled returns:
min VAR(R.f - SUM[w_i * R.s_i]) = min VAR(F - w*S)
  s.t. SUM(w_i) = 1; w_i > 0

where:
R.f   Fund returns
R.s   Style returns
w_i   Style weights

Remembering that VAR(aX + bY) = a^2 VAR(X) + b^2 VAR(Y) + 2ab COV(X,Y), 
we can refactor the problem as:

= VAR(R.f) + w'*V*w - 2*w'*COV(R.f,R.s)

where:
V     Variance-covariance matrix of style index matrix
C     Vector of covariances between the style index and the fund

At this point, we can drop VAR[R.f] as it isn't a function of weights and 
multiply both sides by 1/2 to get:

min (1/2) w'*V*w - C'w
  s.t. w'*e = 1, w_i > 0

where:
e     Vector of 1's

Now, map that to the inputs of solve.QP, which is specified as:
min(-d' b + 1/2 b' D b) with the constraints A' b >= b_0

so:
b is the weight vector,
D is the variance-covariance matrix of the styles
d is the covariance vector between the fund and the styles

Here is the function.  The comments embedded within the function discuss how 
each of the components is constructed for solve.QP.

`style.QPfit` <-
function(R.fund, R.style, model=FALSE, ...) 
{
# INPUTS
# R.fund   Vector of a fund return time series
# R.style   Matrix of a style return time series
# 
# 
# OUTPUTS
# weights   Vector of optimal style index weights
# @ todo: TE  Tracking error between the calc'd and actual fund
# @ todo: Fp  Vector of calculated fund time series
# R^2  Coefficient of determination
#
    # Check to see if the required libraries are loaded
    if(!require("quadprog", quietly=TRUE))
        stop("package", sQuote("quadprog"), "is needed.  Stopping")

    R.fund = checkData(R.fund[,1,drop=FALSE], method="matrix")
    R.style = checkData(R.style, method="matrix")

    # @todo: Missing data is not allowed, use = "pairwise.complete.obs" ?
    style.rows = dim(R.style)[1]
    style.cols = dim(R.style)[2]

    # Calculate D and d
    Dmat = cov(R.style)
    dvec = cov(R.fund, R.style)

    # To specify A' b >= b_0, we create an nxn matrix Amat and the constraints 
    # vector b0.  First we tackle Amat.  The first constraint listed above is 
    # SUM[w_i] = 1.  The left side of the constraint is expressed as a vector 
    # of 1's:
    a1 = rep(1, style.cols)

    # In b0, which represents the right side of the equation, this vector is 
    # paired with the value '1'.

    # The second constraint sets the lower bound of the weights to zero. 
    # First, we set up an identity matrix.  
    a2 = matrix(0, style.cols, style.cols)
    diag(a2) = 1

    # It's paired in b0 with a vector of lower bounds set to zeros:
    w.min = rep(0, style.cols)

    # Construct A from the two components a1 and a2
    Amat = t(rbind(a1, a2))

    # Construct b_0
    b0 = c(1, w.min)

    # This is where 'meq' comes in.  The ?solve.QP page says:
    #     meq: the first 'meq' constraints are treated as equality
    #     constraints, all further as inequality constraints (defaults
    #     to 0).
    # I think that the way to interpret this is: if it is set to a value 
    #'q' <= n, the ordered constraints numbered less than or equal to 'q' are
    # treated as an equality constraint.  In this case, we only want to first
    # constraint to be treated as an equality, so that the weights would sum
    # to exactly 1.  So we set meq = 1.

    # With 'meq' set to 1, the second constraint (a2) is treated as an
    # inequality, so each weight is constrainted to be greater than or equal
    # to zero.
    optimal <- solve.QP(Dmat, dvec, Amat, bvec=b0, meq=1)

    weights = as.data.frame(optimal$solution)
    rownames(weights) = colnames(R.style)
    colnames(weights) = colnames(R.fund)[1]

    # Calculate metrics for the quality of the fit
    R.fitted = rowSums(R.style*matrix(rep(t(weights),style.rows), 
byrow=T,ncol=style.cols))
    R.nonfactor = R.fund - R.fitted

    R.squared = as.data.frame(1 - (var(R.nonfactor)/var(R.fund)))
    adj.R.squared = as.data.frame(1 - (1 - R.squared)*(style.rows - 
1)/(style.rows - style.cols - 1))

    rownames(R.squared) = "R-squared"
    rownames(adj.R.squared) = "Adj R-squared"

    if(model) 
        # returns the entire output of the model
        result = optimal
    else 
        result = list(weights = weights, R.squared = R.squared, adj.R.squared 
= adj.R.squared )

    # @todo: retain the labels on the weights
    # @todo: add other values to output, e.g., 
    #    result <- list(weights = optimal$solution, R.squared = , 
tracking.error = )

    return(result)
}

Here's a contrived example.  Using 10 years of monthly index data, I use the 
Russell 1000 Growth and Value indexes to "explain" the returns of the S&P.  
My prior is that it should roughly split the S&P index in half along the 
value and growth lines given how these indexes are constructed, and with a 
very high R^2.

Here's the data: 

> head(R.fund)
         SP500.TR 
Jan 1996   0.0340 
Feb 1996   0.0093 
Mar 1996   0.0096 
Apr 1996   0.0147 
May 1996   0.0258 
Jun 1996   0.0038 
> head(R.style)
         Russell.1000.Growth Russell.1000.Value
Jan 1996              0.0335             0.0312
Feb 1996              0.0183             0.0076
Mar 1996              0.0013             0.0170
Apr 1996              0.0263             0.0038
May 1996              0.0349             0.0125
Jun 1996              0.0014             0.0008

And here's the execution:

> style.QPfit(R.fund, R.style)
$weights
                     SP500.TR
Russell.1000.Growth 0.5047724
Russell.1000.Value  0.4952276

$R.squared
           SP500.TR
R-squared 0.9880977

$adj.R.squared
              SP500.TR
Adj R-squared  0.98793

So, this appears to work (at least for this simple example), and produces a 
high R^2 like we would expect.  To see the whole output of the solve.QP 
model, I can specify 'model' = TRUE:

> style.QPfit(R.fund[,1,drop=FALSE], R.style, model=TRUE)
$solution
[1] 0.5047724 0.4952276

$value
[1] -0.0008888153

$unconstrainted.solution
[1] 0.5040111 0.4815228

$iterations
[1] 2 0

$iact
[1] 1

Make sense?

pcc
-- 
Peter Carl
145 Scottswood Rd
Riverside, IL 60546
312 307 6346
http://www.braverock.com/~peter