# [R-SIG-Finance] correlation based time series clustering?

Vincent Zoonekynd zoonek at gmail.com
Thu Feb 23 11:17:17 CET 2012

```Here are a few ideas to cluster time series,
with more references.

1. Build the minimum spanning tree on the correlation
matrix. The result is usually very noisy: you may
want to resample the data to see how the trees
change.
This usually gives acceptable results: for
instance, you can often recognise industry groups
from daily or weekly stock returns.
A few references:
An introduction to econophysics, Correlations and complexity in
finance, R.N. Mantegna and H.E. Stanley (2000)
http://arxiv.org/abs/cond-mat/0302546
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1617257
http://arxiv.org/abs/0806.4714
http://arxiv.org/abs/0708.0562
http://arxiv.org/abs/cond-mat/0412411

2. Threshold the correlation matrix and consider the
result as the incidence matrix of a graph: its
connected components can be interpreted as
clusters.

3. Convert the correlation matrix to a distance
matrix, and apply the standard clustering
algorithms: k-means, hierarchical clustering,
Kohonen networks, etc.  You may want to try these
with various estimators of the correlation matrix:
for instance, shrinkage estimators should help
reduce the noise in the data.

4. If you accept methods not based on correlation,
you can model your times series, e.g., with
econometric models (ARMA, GARCH, etc.), stochastic
differential equations (the "Markov operator
distance" at the end of "Option pricing and
estimation of financial models with R", by
S.M. Iacus), wavelet decomposition, iSAX
(http://www.cs.ucr.edu/~eamonn/iSAX/iSAX.html),
etc., and cluster the coefficients of those
models.

-- Vincent

On 23 February 2012 18:29, julien cuisinier <j_cuisinier at hotmail.com> wrote:
>
> Hi Michael,
>
>
>
> A very general question here with little input from you...I am not surprised to see little feedback
>
> I have been looking for something similar & same result so I do not think it exist yet. I am a complete newbie in clustering but looking around there are plenty of R function available, nothing that I could find as simple as using correlation per se.
>
> Thinking about it Im not sure how it would work & anything I can think of would be quite sensitive to the starting point (e.g. calculate pair-wise correls within a market, then start by one stock & cluster with it all other stocks with corrells higher than a certain threshold?) May be some recursive function trying many different starting points? But then what to do with the resulting different cluster structure?
>
> Could you share with the list what reference (not in R) you found on the topic? That would be great if you could share / bring something to the list as well & then see if we can build that in? (very very ambitious of me here =)
>
>
>
> Thanks & regards,
> Julien

```