[R] Clustering for variable reduction

Gad Abraham gabraham at csse.unimelb.edu.au
Sun Apr 6 08:06:18 CEST 2008


Hi,

I have a regression model, where the explanatory variables are factors, 
and I want to include interaction terms, but some combinations occur in 
the data very infrequently.

Hence, I'm using hclust and cutree to hierarchically cluster the levels, 
and get new combined levels to regress on.

Ideally, I would like to be able to cut the tree to achieve clusters 
with at least k observations each. That is, cut the tree at an 
appropriate height for each branch (combine nodes only when they have 
fewer than k obs).

AFAIK, cutree cuts at a uniform height and there's no easy way of 
extracting the number of observations per cluster from hclust (except by 
assigning the new levels to the data and then counting the occurrences).

Does anyone know of code that does this already?

Thanks,
Gad

-- 
Gad Abraham
Dept. CSSE and NICTA
The University of Melbourne
Parkville 3010, Victoria, Australia
email: gabraham at csse.unimelb.edu.au
web: http://www.csse.unimelb.edu.au/~gabraham



More information about the R-help mailing list