[R-sig-Geo] Fuzzy k-means & raster

Ashton Shortridge ashton at msu.edu
Thu Mar 31 20:04:18 CEST 2011


Coincidentally enough, this issue has come up in my office this week. In fact, 
k-means solutions using the generic settings in R appear to be quite unstable. 
Different runs on a small and visually 'simple' data set I've been exploring 
may produce very different clusters and fits.

Setting the nstart parameter to a larger value (e.g., 10 or 50; the default is 
1) seems to greatly improve the stability of the results. Of course this will 
extend the time it takes the function to run, at least for large datasets. 

This also does not address the issue that the same set of points may be 
assigned as a cluster in different runs, but with a different cluster code (e.g. 
cluster 3 the first run, and cluster 7 the second run).

I hope this is useful,

Ashton

On 2011-03-31, Marcelino de la Cruz, wrote:
> This is quite usual with kmeans as the algorithm may find different
> minima every time you call it.
> 
> If you want the same clusters every time you should set the same seed
> before runing kmeans, e.g.
> 
> set.seed(222)
> kmeans.dem5 <- kmeans(demdata,7)
> 
> 
> set.seed(222)
> kmeans.dem5 <- kmeans(demdata,7)
> 
> etc
> 
> Cheers,
> 
> Marcelino
> 
> At 10:33 31/03/2011, Andy Wilson wrote:
> >Hi all,
> >I'm using an example (p213) in Hengl's A Practical Guide to
> >Statistical Mapping (an excellent book btw) as the basis to use
> >fuzzy k-means for the unsupervised extraction of landforms from land
> >surface parameter rasters. I've found that the classes are not
> >reproducible. When I run the same code on the same data 5 times I
> >wont get the same clusters every time - sometimes the same, but
> >sometimes there will be different numbers of cells attributed to
> >each class. Is this to be expected from fuzzy k-means or could this
> >be a problem with my approach.
> >
> >grids50m <- readGDAL("ELEV.asc")
> >LSP.list <- c("RELELEV.asc", "PROFILE.asc")
> >rsaga.sgrd.to.esri(in.sgrds=set.file.extension(LSP.list, ".sgrd"),
> >out.grids=LSP.list, prec=4, out.path=getwd())
> >
> >for(i in 1:length(LSP.list)){
> >
> >  grids50m at data[strsplit(LSP.list[i], ".asc")[[1]]] <-
> > 
> > readGDAL(LSP.list[i])$band1
> >
> >}
> >pc.dem <- prcomp( ~ RELELEV+PROFILE, scale=TRUE, grids50m at data)
> >demdata <- as.data.frame(pc.dem$x)
> >
> >kmeans.dem5 <- kmeans(demdata,7)
> >grids50m$kmeans.dem5 <- kmeans.dem5$cluster
> >grids50m$landform5 <- as.factor(kmeans.dem5$cluster)
> >
> ># Initiate a raster to load the cluster data into
> >r_class7 <- r_elev
> ># Extract the landform data into a vector
> >v_class7 <- as.numeric(grids50m$landform5)
> ># Load the landform data into the raster
> >r_class7 <- setValues(r_class5, v_class5)
> >
> >Many thanks for your advice...
> >Andy Wilson
> >
> >_______________________________________________
> >R-sig-Geo mailing list
> >R-sig-Geo at r-project.org
> >https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> 
> ____________________________________
> 
> Marcelino de la Cruz Rot
> Depto. Biologia Vegetal
> EUIT Agricola
> Universidad Politecnica de Madrid
> 
> tel: 34 + 913365435
> ____________________________________
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo


-----
Ashton Shortridge
Associate Professor			ashton at msu.edu
Dept of Geography			http://www.msu.edu/~ashton
235 Geography Building		ph (517) 432-3561
Michigan State University		fx (517) 432-1671



More information about the R-sig-Geo mailing list