[R-sig-Geo] Skater grouping effectiveness, within- and between-group similarities
Michael O'Donnell
odonnems at gmail.com
Tue Sep 13 22:00:58 CEST 2016
Hi,
I am interested in calculating multiple statistics based on
skater{spdep} results
for a SpatialPointsDataFrame, and I was wondering if someone could help me
verify that what I have done is correct (Q1).
My objective is to evaluate the performance of the clustering while
using different
parameters for different skater() runs. Specifically, I am not sure
how to measure
the within-group similarity and I believe the other statistics are defined
correctly.
Also, can someone provide more details on the objects "not.prune" and
"candidates" (Q2)?
Q1 ------------------------------ These are the statistics that I would
like to calculate:
res1 <- skater() # Example of skater object
# The sum of the between-group dissimilarity
sst <- res1$ssto
# The within-group similarity
sse <- sum(res1$ssw)/max(res1$groups)
# R2
R2 <- (sst-sse)/sst
# AIC,AICc
# AIC = n*log(SSD/n)+2*cov_count
# AICc = AIC + 2*cov_count(cov_count+1)/(n-cov_count-1))
cov_count <- 1 # Number of covariates considered by skater and provided in
data
n_count <- nrow(shape2) # Node count
aic <- (n_count * log(sst)/(n_count) + 2.0 * cov_count)
aicc <- aic + 2.0 * cov_count * (cov_count + 1.0)/(n_count - cov_count -
1.0)
# Calinski-Harabasz pseudo F-statistic
nc <- max(res1$groups)
n <- nrow(shape2)
fstat = (R2 / (nc - 1)) / ((1 - R2) / (n - nc))
# Review
print(c(aic, aicc, fstat, R2))
Q2 ------------------------------
Define "not.prune" and "candidates"
For example, are candidates a list of cluster groups that are statistically
significant while not.prune is a list of nodes that did not get assigned to
a group. I have not been able to locate enough documentation on these
objects and I am not sure how to interpret.
Thank you for your assistance,
Mike
[[alternative HTML version deleted]]
More information about the R-sig-Geo
mailing list