[R] Clustering for variable reduction
Gad Abraham
gabraham at csse.unimelb.edu.au
Sun Apr 6 08:06:18 CEST 2008
Hi,
I have a regression model, where the explanatory variables are factors,
and I want to include interaction terms, but some combinations occur in
the data very infrequently.
Hence, I'm using hclust and cutree to hierarchically cluster the levels,
and get new combined levels to regress on.
Ideally, I would like to be able to cut the tree to achieve clusters
with at least k observations each. That is, cut the tree at an
appropriate height for each branch (combine nodes only when they have
fewer than k obs).
AFAIK, cutree cuts at a uniform height and there's no easy way of
extracting the number of observations per cluster from hclust (except by
assigning the new levels to the data and then counting the occurrences).
Does anyone know of code that does this already?
Thanks,
Gad
--
Gad Abraham
Dept. CSSE and NICTA
The University of Melbourne
Parkville 3010, Victoria, Australia
email: gabraham at csse.unimelb.edu.au
web: http://www.csse.unimelb.edu.au/~gabraham
More information about the R-help
mailing list