[R-SIG-Finance] Getmansky et al. Smoothing Index

Sat Sep 8 04:16:30 CEST 2007

I am working on implementing a measure for evaluating the relative amount of 
serial correlation caused by smoothing in a return series as described in:

Getmansky, M., A. W. Lo, and I. Makarov. “An Econometric Model of Serial 
Correlation and Illiquidity in Hedge Fund Returns.” Journal of Financial 
Economics 74 (2004), 529-609.

In that paper, the authors argue that there are three possible 
sources of serial correlation in hedge fund returns: time-varying expected 
returns, time-varying leverage and incentive fees with high-water marks.  
They carefully go through all three to argue that none of these can 
effectively explain the high levels of observed serial correlation in the 
context of hedge funds.  With that, they turn their focus towards the 
combination of illiquidity and smoothed returns. 

The remainder of the paper argues that serial correlation can be considered a 
proxy for illiquidity and return smoothing. Even though illiquidity and 
smoothing are two distinct phenomena, they argue to consider them together 
since one facilitates the other.  The basic arguement goes that 
return-smoothing behavior yields a more consistent set of returns over time, 
with lower volatility and, therefore, a higher Sharpe ratio, but it also 
produces serial correlation as a side effect.  Part of the motivation here is 
that such a measure would give us a way to compare the relative smoothing 
among our managers.

To measure and alleviate the effects of smoothing, they offer a rather 
complicated solution.  The first part involves estimating the smoothing 
profile using maximum likelihood estimation (MLE) in a fashion similar to the 
estimation of standard moving-average time series models.  They define 
a "smoothing profile" as a vector of coefficients for an MLE fit on returns 
using a two-period moving-average process.  The coefficients, θj, are then 
normalized to sum to interpreted as a "weighted average of the fund’s true 
returns over the most recent k + 1 periods, including the current period."  
In other words, the "information generated at date t may not be fully 
impounded into prices until several periods later."  If the first coefficient 
(θ0) was 0.719, it would imply that only 71.9% of that fund’s true current 
monthly return was reported, with the remaining 28.1% distributed over the 
next two months (recall the constraint that θ0 + θ1 + θ2 = 1). The estimates 
0.201 and 0.080 for θ1 and θ2 imply that on average, the current reported 
return also includes 20% of last months true return and 8% of the previous 
month's return.

The measure probably does capture some essence of serial correlation from a 
return series.  If these weights are disproportionately centered on a small 
number of lags, relatively little serial correlation will be induced. 
However, if the weights are evenly distributed among many lags, this would 
show higher serial correlation.  The Herfindahl Index was originally 
developed to measure concentration of manufacturers or suppliers in a 
marketplace, using market share of member companies in an industry -- and has 
very little to do with the measure.  Getmansky, et al. simply use it to scale 
the coefficients, or "smoothing profile", into a single number, or "smoothing 
index".  In the context of smoothed returns, a lower value of the smoothing 
index implies more smoothing, and the upper bound of 1 implies no smoothing. 

There are a number of issues for implementers lurking in their methodology.  
The first and probably most obvious issue comes from fitting a model to the 
returns series.  The methodology proposed is difficult to understand and 
implement correctly.  Fortunately, there are functions in most popular 
statistics packages that can fit such a model.  There are variations in 
exactly how those algorithms are implemented that may cause the results to be 
difficult to repeat exactly.  But, for the moment, let's pretend I found 
something that comes close to their methodology to use.  

In my tests, the smoothing index that I calculate is not particularly stable 
through time.  When measured over a 36- or 60-month rolling window, values 
wiggle through regions where you might expect and then suddenly spike.  Those 
spikes don't mean that the manager suddenly found a pool of liquidity, or was 
on vacation for a few months and couldn't smooth the returns - they mean that 
the model was mis-specified and the measure isn't valid through that period.

Getmansky, et al. comment on the possibility of mis-specification, noting that 
the smoothing index "does not always perform well in small samples or when 
the underlying distribution of true returns is not normal as hypothesized."  
They offer three tests for specification: Did the fit converge? Are all of 
the estimated smoothing coefficients positive? and Is it wildly different 
than the estimates from a linear regression approach (which I didn't 
implement)?

The second issue is that they don't normalize the fit coeficients to [0,1], so 
the resulting 'smoothing index' is not limited to that range either.  As a 
result, all we can say is that lower values are "less liquid" and higher 
values are "more liquid" or mis-specified.

This group also wrote a second paper that updated the observations of the 
first.  "Systemic Risk and Hedge Funds," by Nicholas Chan, Mila Getmansky, 
Shane M. Haas, and Andrew W. Lo, which was published as an NBER Working Paper 
(No. 11200) in March 2005.  I would note that their reported experience with 
this measure seems much more consistent than mine, which suggests that the 
fitting methodology I'm using is incorrect or more prone to 
mis-specification.

My current draft of the code is attached below.  I'm using the arima() 
function to fit an MA(2) model as follows:

MA2=arima(ra, order=c(0,0,2), method="ML", transform.pars=TRUE, 
include.mean=FALSE)

I'm still scratching my head about whether I'm doing this correctly.  I've 
noticed that the fits are very unstable through time (which makes sense, 
given the normality assumption buried in here), but that would limit it's 
utility.  I've noticed that if I extend the model to order=c(0,0,3) it helps 
some, but not a lot.

Three questions:
- Am I using the arima fit function correctly?
- Has someone else implemented this with more rigor?
- Has anyone else found this to be a useful measure?

Thanks in advance,

pcc

`SmoothingIndex` <-
function (ra, ...)
{ # @author Peter Carl

    # Description:
    # SmoothingIndex

    # ra    log return vector

    # Function:

    ra = checkData(ra, method="vector", na.omit=TRUE)

    MA2 = NULL
    thetas = 0
    SmoothingIndex = 0

    # First, create a a maximum likelihood estimation fit for an MA process.

    # include.mean: Getmansky, et al. JFE 2004 p 555 "By applying the above
    # procedure to observed de-meaned returns...", so set parameter to FALSE
    # transform.pars: ibid, "we impose the additional restriction that the   
    # estimated MA(k)  process be invertible." so set the parameter to TRUE
    MA2 = arima(ra, order=c(0,0,2), method="ML", transform.pars=TRUE, 
include.mean=FALSE)

    thetas = as.numeric((MA2$coef)/sum(MA2$coef))

    SmoothingIndex = sum(thetas^2) 

    return(SmoothingIndex)

}
-- 
Peter Carl