[R] How to use 'prcomp' with CLUSPLOT?
R. Michael Weylandt
michael.weylandt at gmail.com
Fri Nov 4 15:55:02 CET 2011
Hello Jo,
Full disclosure: I don't know much about clustering/partition cluster
analysis/etc so I've only attacked this as an R problem. However, this
might get you going in the right direction:
df <- read.table(textConnection("PRVID,VAR1,VAR2,VAR3,VAR4,VAR5,VAR6,VAR7,VAR8,VAR9,VAR10,VAR11
PRV1,0,54463,53049,62847,75060,184925,0,0,0,0,0
PRV2,0,2100,76,131274,0,0,0,0,0,0,18
PRV3,967,0,0,0,0,0,0,0,0,3634,0
PRV4,817,18344,3274,9264,1862,0,0,141,0,0,0
PRV5,0,0,0,0,0,0,29044,0,0,0,0
PRV6,59,6924,825,3008,377,926,0,0,10156,0,5555
PRV7,11,24902,36040,47223,20086,0,0,749,415,0,0"), header = T, sep =
",", stringsAsFactors = T)
closeAllConnections()
library(cluster)
mat <- as.matrix(df[,-1])
newtble <- prop.table(mat, 1) * 100
num.clust <- 3
clusplotMW <- cluster:::clusplot.default # Create a copy of the two
necessary functions for clusplot that route to princomp
mkCheckMW <- cluster:::mkCheckX
body(mkCheckMW) <- parse(text=gsub("princomp",
"prcomp",deparse(body(mkCheckMW)))) # replace princomp with prcomp in
our copy
body(clusplotMW) <- parse(text=gsub("mkCheckX",
"mkCheckMW",deparse(body(clusplotMW)))) # route our clusplot to our
mkCheckX
clusplotMW(newtble, fitnw$cluster, color = T, shade = T, lines = 0)
Since you didn't provide a working example, I can't verify this, but
let me know if it works for you.
Michael
On Thu, Nov 3, 2011 at 8:10 PM, Jo Frabetti <jfrabetti at sdsc.edu> wrote:
> Hello,
>
> I have a large data set that has more columns than rows (sample data below). I am trying to perform a partitioning cluster analysis and then plot that using pca. I have tried using CLUSPLOT(), but that only allows for 'princomp' where I need 'prcomp' as I do not want to reduce my columns. Is there a way to edit the CLUSPLOT() code to use 'prcomp', please?
>
> # sample of my data
> PRVID,VAR1,VAR2,VAR3,VAR4,VAR5,VAR6,VAR7,VAR8,VAR9,VAR10,VAR11
> PRV1,0,54463,53049,62847,75060,184925,0,0,0,0,0
> PRV2,0,2100,76,131274,0,0,0,0,0,0,18
> PRV3,967,0,0,0,0,0,0,0,0,3634,0
> PRV4,817,18344,3274,9264,1862,0,0,141,0,0,0
> PRV5,0,0,0,0,0,0,29044,0,0,0,0
> PRV6,59,6924,825,3008,377,926,0,0,10156,0,5555
> PRV7,11,24902,36040,47223,20086,0,0,749,415,0,0
>
> library(cluster)
> fn = "big.csv";
> tbl = read.table(fn, header=TRUE, sep=",", row.names=1);
> mat <- as.matrix(tbl);
> newtbl <- prop.table(mat,1)*100;
>
> num.clust <- 3;
> fitnw <- kmeans(newtbl, num.clust);
> clusplot(newtbl, fitnw$cluster, color=TRUE, shade=TRUE, lines=0, main= paste('Principal Components plot - Kmeans ', clust.level, ' Clusters') )
>
> Error in princomp.default(x, scores = TRUE, cor = ncol(x) != 2) :
> 'princomp' can only be used with more units than variables
>
> Thank you for R and any assistance you may offer!
>
> Jo
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list