[R-SIG-Finance] Implementing Sharpe's style analysis with solve.QP
Peter Carl
peter at braverock.com
Tue Feb 26 01:34:55 CET 2008
I've recently become interested in implementing style analysis as described by
William Sharpe in:
http://www.stanford.edu/~wfsharpe/art/sa/sa.htm
In particular, I want to implement the 'constrained' method he discusses using
the quadratic programming approach he advocates. I thought that this might
be easy to do in R using the function 'solve.QP' in the package 'quadprog'.
Easy, of course, is a relative term. I've found the documentation somewhat
oblique, and Googling hasn't produced much in the way of help.
So I thought I would offer what I've been able to figure out and try to get
some feedback. What I have here *seems* to work, although I haven't had to
specify this kind of a problem myself in, well, decades.
According to Sharpe (1994) what we're trying to do is minimize the variance of
the difference between the observed returns and the modeled returns:
min VAR(R.f - SUM[w_i * R.s_i]) = min VAR(F - w*S)
s.t. SUM(w_i) = 1; w_i > 0
where:
R.f Fund returns
R.s Style returns
w_i Style weights
Remembering that VAR(aX + bY) = a^2 VAR(X) + b^2 VAR(Y) + 2ab COV(X,Y),
we can refactor the problem as:
= VAR(R.f) + w'*V*w - 2*w'*COV(R.f,R.s)
where:
V Variance-covariance matrix of style index matrix
C Vector of covariances between the style index and the fund
At this point, we can drop VAR[R.f] as it isn't a function of weights and
multiply both sides by 1/2 to get:
min (1/2) w'*V*w - C'w
s.t. w'*e = 1, w_i > 0
where:
e Vector of 1's
Now, map that to the inputs of solve.QP, which is specified as:
min(-d' b + 1/2 b' D b) with the constraints A' b >= b_0
so:
b is the weight vector,
D is the variance-covariance matrix of the styles
d is the covariance vector between the fund and the styles
Here is the function. The comments embedded within the function discuss how
each of the components is constructed for solve.QP.
`style.QPfit` <-
function(R.fund, R.style, model=FALSE, ...)
{
# INPUTS
# R.fund Vector of a fund return time series
# R.style Matrix of a style return time series
#
#
# OUTPUTS
# weights Vector of optimal style index weights
# @ todo: TE Tracking error between the calc'd and actual fund
# @ todo: Fp Vector of calculated fund time series
# R^2 Coefficient of determination
#
# Check to see if the required libraries are loaded
if(!require("quadprog", quietly=TRUE))
stop("package", sQuote("quadprog"), "is needed. Stopping")
R.fund = checkData(R.fund[,1,drop=FALSE], method="matrix")
R.style = checkData(R.style, method="matrix")
# @todo: Missing data is not allowed, use = "pairwise.complete.obs" ?
style.rows = dim(R.style)[1]
style.cols = dim(R.style)[2]
# Calculate D and d
Dmat = cov(R.style)
dvec = cov(R.fund, R.style)
# To specify A' b >= b_0, we create an nxn matrix Amat and the constraints
# vector b0. First we tackle Amat. The first constraint listed above is
# SUM[w_i] = 1. The left side of the constraint is expressed as a vector
# of 1's:
a1 = rep(1, style.cols)
# In b0, which represents the right side of the equation, this vector is
# paired with the value '1'.
# The second constraint sets the lower bound of the weights to zero.
# First, we set up an identity matrix.
a2 = matrix(0, style.cols, style.cols)
diag(a2) = 1
# It's paired in b0 with a vector of lower bounds set to zeros:
w.min = rep(0, style.cols)
# Construct A from the two components a1 and a2
Amat = t(rbind(a1, a2))
# Construct b_0
b0 = c(1, w.min)
# This is where 'meq' comes in. The ?solve.QP page says:
# meq: the first 'meq' constraints are treated as equality
# constraints, all further as inequality constraints (defaults
# to 0).
# I think that the way to interpret this is: if it is set to a value
#'q' <= n, the ordered constraints numbered less than or equal to 'q' are
# treated as an equality constraint. In this case, we only want to first
# constraint to be treated as an equality, so that the weights would sum
# to exactly 1. So we set meq = 1.
# With 'meq' set to 1, the second constraint (a2) is treated as an
# inequality, so each weight is constrainted to be greater than or equal
# to zero.
optimal <- solve.QP(Dmat, dvec, Amat, bvec=b0, meq=1)
weights = as.data.frame(optimal$solution)
rownames(weights) = colnames(R.style)
colnames(weights) = colnames(R.fund)[1]
# Calculate metrics for the quality of the fit
R.fitted = rowSums(R.style*matrix(rep(t(weights),style.rows),
byrow=T,ncol=style.cols))
R.nonfactor = R.fund - R.fitted
R.squared = as.data.frame(1 - (var(R.nonfactor)/var(R.fund)))
adj.R.squared = as.data.frame(1 - (1 - R.squared)*(style.rows -
1)/(style.rows - style.cols - 1))
rownames(R.squared) = "R-squared"
rownames(adj.R.squared) = "Adj R-squared"
if(model)
# returns the entire output of the model
result = optimal
else
result = list(weights = weights, R.squared = R.squared, adj.R.squared
= adj.R.squared )
# @todo: retain the labels on the weights
# @todo: add other values to output, e.g.,
# result <- list(weights = optimal$solution, R.squared = ,
tracking.error = )
return(result)
}
Here's a contrived example. Using 10 years of monthly index data, I use the
Russell 1000 Growth and Value indexes to "explain" the returns of the S&P.
My prior is that it should roughly split the S&P index in half along the
value and growth lines given how these indexes are constructed, and with a
very high R^2.
Here's the data:
> head(R.fund)
SP500.TR
Jan 1996 0.0340
Feb 1996 0.0093
Mar 1996 0.0096
Apr 1996 0.0147
May 1996 0.0258
Jun 1996 0.0038
> head(R.style)
Russell.1000.Growth Russell.1000.Value
Jan 1996 0.0335 0.0312
Feb 1996 0.0183 0.0076
Mar 1996 0.0013 0.0170
Apr 1996 0.0263 0.0038
May 1996 0.0349 0.0125
Jun 1996 0.0014 0.0008
And here's the execution:
> style.QPfit(R.fund, R.style)
$weights
SP500.TR
Russell.1000.Growth 0.5047724
Russell.1000.Value 0.4952276
$R.squared
SP500.TR
R-squared 0.9880977
$adj.R.squared
SP500.TR
Adj R-squared 0.98793
So, this appears to work (at least for this simple example), and produces a
high R^2 like we would expect. To see the whole output of the solve.QP
model, I can specify 'model' = TRUE:
> style.QPfit(R.fund[,1,drop=FALSE], R.style, model=TRUE)
$solution
[1] 0.5047724 0.4952276
$value
[1] -0.0008888153
$unconstrainted.solution
[1] 0.5040111 0.4815228
$iterations
[1] 2 0
$iact
[1] 1
Make sense?
pcc
--
Peter Carl
145 Scottswood Rd
Riverside, IL 60546
312 307 6346
http://www.braverock.com/~peter
More information about the R-SIG-Finance
mailing list