[R] Information criteria for kmeans
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Dec 5 12:24:38 CET 2007
This is not primarily an R question: if you tell us how you want to define
it, we may be able to help you compute it. I presume you are talking
about Schwarz (1978), which is not billed as an 'information criterion'.
AFAIK, all Gideon Schwarz did was to define a criterion for linear
regression. People have applied it to other situations with a vector
space of parameters. However in many clustering methods (including
kmeans, and as for example in classification trees) there is also a
combinatorial part of the fit: you optimize over both the cluster centres
and the allocation of units to clusters. It does not come close to the
Schwarz framework.
Nor does clustering fit into Akaike (1973, 1974)'s information framework.
There is discussion in Banfield & Raftery (1993) of a Schwarz-like
criterion for clustering, but with a rather different derivation and I
don't think it should be attributed to Schwarz.
On Wed, 5 Dec 2007, Serguei Kaniovski wrote:
>
> Hello,
>
> how is, for example, the Schwarz criterion is defined for kmeans? It should
> be something like:
>
> k <- 2
> vars <- 4
> nobs <- 100
>
> dat <- rbind(matrix(rnorm(nobs, sd = 0.3), ncol = vars),
> matrix(rnorm(nobs, mean = 1, sd = 0.3), ncol = vars))
>
> colnames(dat) <- paste("var",1:4)
>
> (cl <- kmeans(dat, k))
>
> schwarz <- sum(cl$withinss)+ vars*k*log(nobs)
>
> Thanks for your help,
> Serguei
> ________________________________________
> Austrian Institute of Economic Research (WIFO)
>
> P.O.Box 91 Tel.: +43-1-7982601-231
> 1103 Vienna, Austria Fax: +43-1-7989386
>
> Mail: Serguei.Kaniovski at wifo.ac.at
> http://www.wifo.ac.at/Serguei.Kaniovski
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list