[R] Cross-checking a custom function for separability indices
Nikos Alexandris
nikos.alexandris at felis.uni-freiburg.de
Wed May 5 04:58:13 CEST 2010
Nikos:
> Hi list!
>
> I have prepared a custom function (below) in order to calculate
> separability indices (Divergence, Bhattacharyya, Jeffries-Matusita,
> Transformed divergene) between two samples of (spectral land cover)
> classes.
I've found a mistake: initially I used "log10" instead of "log" in the
definition of the Bhatacharryya index.
Thanks to a friend we cross-compared results of the (corrected) custom
function(s) (posted here) and the results derived from ERDAS Imagine (a
proprietary remote sensing box) and they agree concerning the Bhattacharyya
index (and successively the Jeffries-Matusita index). Not sure about the
Divergence (and successively the Transformed Divergence) though.
There is a subtle difference worthwhile to mention:
ERDAS Imagine rejected an observation (when calculating the "mean" of the
sample(s)). One observation less due to... "pixel inclusion/exclusion" rules?
No idea. Needless to say, using R one has total control (given he knows what
is going on).
Also, I've expanded this mini personal project by writing more functions in
order to get an output that fits my current needs. Functions are attached as
".R" files and explanations (along with some questions of course) are given
below [*]. Some things are hardcoded since I have little experience on writing
generic functions.
Would be nice if someone is interested in this and can provide assistance to
make it better, more generic and useful.
Thanks, Nikos
---
> # two samples
> sample.1 <- c (1362, 1411, 1457, 1735, 1621, 1621, 1791, 1863, 1863, 1838)
> sample.2 <- c (1354, 1458, 1458, 1458, 1550, 1145, 1428, 1573, 1573, 1657)
> # running the custom function (below)
> separability.measures ( sample.1 , sample.2 )
> Divergence: 1.5658
> Bhattacharryya: 0.1683
> Jeffries-Matusita: 0.3098
> Transformed divergence: 0.3555
after correction the results are:
Divergence: 1.5658
Transformed divergence: 0.3555
Bhattacharryya: 0.1805
Jeffries-Matusita: 0.3302
---
[*] Custom functions for separability measures. Note: the working directory,
location(s) for results export (as csv) as well as the location of each script
that is to be sourced are hard-coded.
1. separability.measures <- function ( Vector.1 , Vector.2 )
comments:
- converts input vectors to matrices
2. separability.measures.info <- function (
Data.Frame.1 ,
Data.Frame.2 ,
Data.Frame.1.label ,
Data.Frame.2.label ,
i ,
print = FALSE
)
comments:
- calls "separability.measures()" and prints out more info about what is
compared
3. separability.measures.class <- function (
Data.Frame.1 ,
Data.Frame.2 ,
print = FALSE
)
comments:
- calls "separability.measures.info.R()"
- calculates separability measures between samples in "data.frames" with
identical structure
4. separability.matrix <- function (Data.Frame.M1,Data.Frame.M2)
comments:
- calls "separability.measures.class()"
- constructs a matrix where
- rows are separability indices
- columns equal the "dim (data.frame) [2]" (any of the identical input
data.frames) and hold... well, the separability measures!
5. separability.matrices <- function ( Reference.Data.Frame ,
Target.Data.Frame.A ,
Target.Data.Frame.B
)
comments:
- calls "separability.matrix()"
- print out something that is meant to ease off reading/comparing the
results _in case_ one wants to compare differences between one (call it)
reference sample and two (call them) target samples, e.g. calculate indices
between:
- sample.1 and sample.2
- sample.1 and sample.3
- compare the above results
(
The output of "separability.matrix()" displays the separability measures on a
per row-basis ( each row holds results for one index) where:
- in the 1st column are the results (separabilities) for "sample.1 vs.
sample.2", for variable 1 of the (identically structured) input data.frames
- in the 2nd column are the ones for "sample.1 vs. sample.3" for variable 1
of the (identically structured) input data.frames
- in the 3rd column are the ones for "sample.1 vs. sample.2" for variable 2
of the (identically structured) input data.frames
- etc.
)
- what is missing is a mechanism that gives proper names for the columns. I
imagine(d) names like (for column 1) "sample.1 vs sample.2, variable1",
"sample.1 vs sample.3, variable 1" , "sample.1 vs sample.2, variable 2", etc.
But (a) I am not sure how to implement it and (b) they are _too_ long. So they
just are numbered as usual [1] [2] [3]...
6. separability.matrices_multiref <- function ( Reference.Data.Frame.A ,
Target.Data.Frame.A ,
Reference.Data.Frame.B ,
Target.Data.Frame.B
)
comments:
- same as function 5 (above) but accepts 4 different samples (as data.frames
of course).
Questions:
If you made it read this post till here, maybe you could give hints on how to
handle:
1. the definition of the directory path that leads to the custom functions
that are required to be source(d)? Currently hard-coded at a specific
directory where I store custom functions. Is it better to make 1 _big_ script
which includes all functions? What is good coding practice in such occasions ?
2. "column naming" after a column by column merging of three different
matrices (data.frames)? This concerns "separability.matrices()".
3. the long command built-up using "assign()" and "else()" ? For some reason
the functions fails to run if the "else(...)" statement is moved in the next
line. For this reason the script "separability.matrix.R" does not respect the
"78 characters" width limit.
-------------- next part --------------
# Custom function for various Separability Measures
# by Nikos Alexandris, Freiburg, 8.04.2010
# ( based on Divergence and Jeffries-Matusita, requires input variables "as.matrices" )
separability.measures <- function ( Vector.1 , Vector.2 ) {
# convert vectors to matrices in case they are not
Matrix.1 <- as.matrix (Vector.1)
Matrix.2 <- as.matrix (Vector.2)
# define means
mean.Matrix.1 <- mean ( Matrix.1 )
mean.Matrix.2 <- mean ( Matrix.2 )
# define difference of means
mean.difference <- mean.Matrix.1 - mean.Matrix.2
# define covariances for supplied matrices
cv.Matrix.1 <- cov ( Matrix.1 )
cv.Matrix.2 <- cov ( Matrix.2 )
# define the halfsum of cv's as "p"
p <- ( cv.Matrix.1 + cv.Matrix.2 ) / 2
# --%<------------------------------------------------------------------------
# calculate the Bhattacharryya index
bh.distance <- 0.125 *
t ( mean.difference ) *
p^ ( -1 ) *
mean.difference +
0.5 * log (
det ( p ) / sqrt (
det ( cv.Matrix.1 ) *
det ( cv.Matrix.2 )
)
)
# --%<------------------------------------------------------------------------
# calculate Jeffries-Matusita
# following formula is bound between 0 and 2.0
jm.distance <- 2 * ( 1 - exp ( -bh.distance ) )
# also found in the bibliography:
# jm.distance <- 1000 * sqrt ( 2 * ( 1 - exp ( -bh.distance ) ) )
# the latter formula is bound between 0 and 1414.0
# --%<------------------------------------------------------------------------
# calculate the divergence
# trace (is the sum of the diagonal elements) of a square matrix
trace.of.matrix <- function ( SquareMatrix ) {
sum ( diag ( SquareMatrix ) ) }
# term 1
divergence.term.1 <- 1/2 *
trace.of.matrix (
( cv.Matrix.1 - cv.Matrix.2 ) *
( cv.Matrix.2^ (-1) - cv.Matrix.1^ (-1) )
)
# term 2
divergence.term.2 <- 1/2 *
trace.of.matrix (
( cv.Matrix.1^ (-1) + cv.Matrix.2^ (-1) ) *
( mean.Matrix.1 - mean.Matrix.2 ) *
t ( mean.Matrix.1 - mean.Matrix.2 )
)
# divergence
divergence <- divergence.term.1 + divergence.term.2
# --%<------------------------------------------------------------------------
# and the transformed divergence
transformed.divergence <- 2 * ( 1 - exp ( - ( divergence / 8 ) ) )
# --%<------------------------------------------------------------------------
# print results --- look at "separability.measures.pp"
# get "column" names (hopefully they exist) --- to use for/with... ??? ---
name.of.Matrix.1 <- colnames ( Matrix.1 )
name.of.Matrix.2 <- colnames ( Matrix.2 )
# create new objects to be used with... ?
assign ( "divergence.vector" , divergence ,
envir = .GlobalEnv)
assign ( "jm.distance.vector" , jm.distance ,
envir = .GlobalEnv)
assign ( "bh.distance.vector" , bh.distance ,
envir = .GlobalEnv )
assign ( "transformed.divergence.vector" , transformed.divergence ,
envir = .GlobalEnv )
# print some message --- ?
# a "just do it" solution using cat()
cat ( paste ( "Divergence: " ,
round ( divergence , digits = 4 ) , sep = "" ) , "\n")
cat ( paste ( "Transformed divergence: " ,
round ( transformed.divergence , digits = 4 ) , sep = "" ) , "\n" )
cat ( paste ( "Bhattacharryya: " ,
round ( bh.distance , digits = 4 ) , sep = "" ) , "\n")
cat ( paste ( "Jeffries-Matusita: " ,
round ( jm.distance , digits = 4 ) , sep = "" ) , "\n")
cat ("\n")
}
# --%<------------------------------------------------------------------------
# Sources
# Richards, John A., Jia, X., "Remote Sensing Digital Image Analysis: ...", Springer
# Wikipedia
# --%<------------------------------------------------------------------------
# Details
# Bhattacharyya distance measures the similarity of two discrete or
# continuous probability distributions.
# (from: http://en.wikipedia.org/wiki/Bhattacharyya_distance)
# Divergence [in vector calculus] is an operator that measures the magnitude
# of a vector field's source or sink at a given point...
# (from: http://en.wikipedia.org/wiki/Divergence)
# Jeffries-Matusita ...
# add info...
# Transformed divergence ...
# add info...
# --%<------------------------------------------------------------------------
# Acknowledgements
# Augustin Lobo for initial help with the Bhattacharryya Index
# Nikos Koutsias
# useRs
# many more people...
-------------- next part --------------
# Custom function for various Separability Measures
# separability measures between data.frames with identical structure
# by Nikos Alexandris, Freiburg, April 2010
# Code ---------------------------------------------------------------------- >
separability.measures.class <- function (
Data.Frame.1 ,
Data.Frame.2 ,
print = FALSE
) {
# where to look for custom (sub-)functions?
setwd ("~/R/code/functions/")
# DEPENDS on:
# separability.measures.info()
source ("separability.measures.info.R")
# create labels from names of input(ted) data.frames
Data.Frame.1.label <- deparse ( substitute ( Data.Frame.1 ) )
Data.Frame.2.label <- deparse ( substitute ( Data.Frame.2 ) )
# create new vectors to carry results that will be fed in the final matrix
assign ( "divergence.vectorS" ,
numeric(0) , envir = .GlobalEnv )
assign ( "transformed.divergence.vectorS" ,
numeric(0) , envir = .GlobalEnv )
assign ( "bh.distance.vectorS" ,
numeric(0) , envir = .GlobalEnv )
assign ( "jm.distance.vectorS" ,
numeric(0) , envir = .GlobalEnv )
# loop over number of columns of the data.frame(s)
for ( i in 1 : dim ( Data.Frame.1 ) [2] )
# it does not matter which data.frame ( 1st or 2nd ) is used
# they should be of the same structure otherwise its pointless
# call "separability.measures.info" which calls "separability.measures"
separability.measures.info (
Data.Frame.1 ,
Data.Frame.2 ,
Data.Frame.1.label ,
Data.Frame.2.label ,
i
)
}
#< ----------------------------------------------------------------------- Code
-------------- next part --------------
# Custom function for various Separability Measures
# construct a bi-sample matrix comparing separabilities column-by-column... !@#?
# "rows = separability.indices" and "columns = ( dim (data.frame) [2] ) * 2"
# by Nikos Alexandris, Freiburg, April 2010
# Code ---------------------------------------------------------------------- >
# requires 3 input data.frames
separability.matrices <- function ( Reference.Data.Frame ,
Target.Data.Frame.A ,
Target.Data.Frame.B
) {
# suppress output -----------------------------------------------------------/
sink ( "/dev/null" )
# where
# Data.Frame.1 -> reference
# Data.Frame.2 -> target A
# Data.Frame.2 -> target B
# UNCOMMENT below to be used for writing results to a csv file -- -- -- -- --
# function name --- to be used for the exported csv filename
separability.function.name <- sys.call (1)
#print ( separability.function.name )
separability.function.name <- gsub ( " " , "_" , separability.function.name )
#print ( separability.function.name )
separability.function.name <- gsub ( "," , "_" , separability.function.name )
#print ( separability.function.name )
# filename
separability.filename <- paste (
c (separability.function.name) ,
collapse = "_"
)
#print ( separability.filename )
separability.filename.csv <- paste ( separability.filename ,
".csv" ,
sep = ""
)
#print ( separability.filename.csv )
# -- -- -- -- UNCOMMENT above to be used for writing results to a csv file --
# names of rows --- pass as an argument to matrix() directly ---
separability.indices <- list ( c ( "Divergence" ,
"Transformed divergence" ,
"Bhattacharryya" ,
"Jeffries-Matusita"
)
)
# number of rows is "fixed" !
separability.Matrix.rows <- length ( unlist ( separability.indices ) )
# create empty matrix to carry
# assign (
# "separability.Matrix.X" ,
# matrix (
# numeric (0) ,
# nrow = separability.Matrix.rows ,
# ncol = dim ( Reference.Data.Frame) [2] * 2 ,
# ) ,
# envir = .GlobalEnv
# )
# is it there?
# print ( separability.Matrix.X )
# get separabilities between reference and each target
# source the "parent" function
source ( "~/R/code/functions/separability.matrix.R" )
# reference ~ target A
separability.matrix ( Reference.Data.Frame , Target.Data.Frame.A )
separability.Matrix.A <- separability.Matrix.X
# reference ~ target B
separability.matrix ( Reference.Data.Frame , Target.Data.Frame.B )
separability.Matrix.B <- separability.Matrix.X
# combine results in on matrix
assign ( "separability.Matrices",
matrix (
rbind (
separability.Matrix.A ,
separability.Matrix.B
) ,
ncol = dim (separability.Matrix.A) [2] * 2
)
)
# UNsuppress output ---------------------------------------------------------/
sink ()
# row names
rownames (separability.Matrices) <- unlist (separability.indices)
# write to csv
write.csv (
round (separability.Matrices , digits = 3) ,
file = separability.filename.csv ,
eol = "\n",
)
# print out
round ( separability.Matrices , digits = 3 )
}
#< ----------------------------------------------------------------------- Code
# Sources
# combine matrices column by column
# <http://promberger.info/linux/2007/07/09/r-combine-two-matrices-column-by-column/comment-page-1/#comment-37240>
-------------- next part --------------
# Custom function for various Separability Measures
# more info --- used within a for() loop
# by Nikos Alexandris, Freiburg, April 2010
# Code ---------------------------------------------------------------------- >
separability.measures.info <- function (
Data.Frame.1 ,
Data.Frame.2 ,
Data.Frame.1.label ,
Data.Frame.2.label ,
i ,
print = FALSE
) {
# where to find custom (sub-)function(s)
setwd ( "~/R/code/functions/" )
# DEPENS on:
# separability.measures()
source ("separability.measures.R")
# pretty print ( is it really pretty? )
cat (
"Separability measures between" ,
"\n" , "\n" , " - \"" , colnames ( Data.Frame.1 ) [i] ,
"\" of \"" , Data.Frame.1.label , "\"" ,
"\n" , " and" ,
"\n" , " - \"" , colnames ( Data.Frame.2 ) [i] ,
"\" of \"" , Data.Frame.2.label , "\"" ,
sep = ""
)
# add empty (new)line
cat (
"\n" , "\n"
)
# calculate separability measures
separability.measures ( Data.Frame.1 [i] , Data.Frame.2 [i] )
# fill the required vectors for the final matrix
assign ( "divergence.vectorS" ,
append ( divergence.vectorS , divergence.vector ) ,
envir = .GlobalEnv )
assign ( "transformed.divergence.vectorS" ,
append ( transformed.divergence.vectorS , transformed.divergence.vector) ,
envir = .GlobalEnv )
assign ( "bh.distance.vectorS" ,
append ( bh.distance.vectorS , bh.distance.vector ) ,
envir = .GlobalEnv )
assign ( "jm.distance.vectorS" ,
append ( jm.distance.vectorS , jm.distance.vector ) ,
envir = .GlobalEnv )
# print something to ease off visual separation of output
cat (
"--%<---" ,
"\n"
)
}
#< --------------------------------------------------------------------- Code #
-------------- next part --------------
# Custom function for various Separability Measures
# construct a matrix with:
# "rows = separability.indices" and "columns = dim (data.frame) [2]"
# by Nikos Alexandris, Freiburg, April 2010
# Code ---------------------------------------------------------------------- >
separability.matrix <- function (Data.Frame.M1,Data.Frame.M2) {
# where to look for custom (sub-)functions?
setwd ( "~/R/code/functions/" )
# DEPENDS on: separability.measures.class()
source ( "separability.measures.class.R" , local = FALSE )
# UNCOMMENT below to be used for writing results to a csv file ---------------
# function name --- to be used for the exported csv filename
separability.function.name <- sys.call (1)
#print ( separability.function.name )
separability.function.name <- gsub ( " " , "_" , separability.function.name )
#print ( separability.function.name )
separability.function.name <- gsub ( "," , "_" , separability.function.name )
#print ( separability.function.name )
# filename
separability.filename <- paste (
c (separability.function.name) ,
collapse = "_"
)
#print ( separability.filename )
separability.filename.csv <- paste ( separability.filename ,
".csv" ,
sep = ""
)
#print ( separability.filename.csv )
# ------------ UNCOMMENT above to be used for writing results to a csv file --
# names of rows --- pass as an argument to matrix() directly ---
separability.indices <- list ( c ( "Divergence" ,
"Transformed divergence" ,
"Bhattacharryya" ,
"Jeffries-Matusita"
)
)
# number of rows is "fixed" !
separability.Matrix.rows.number <- length ( unlist ( separability.indices ) )
# number of columns depends on the data frames
if ( dim (Data.Frame.M1) [2] == ( dim (Data.Frame.M2) [2] ) )
assign ( "separability.Matrix.columns.number" , dim (Data.Frame.M1) [2] , envir = .GlobalEnv ) else cat ( "Warning: number of columns differs between the 2 data frames (while it should be the same?)." , "\n" )
# construct a matrix of [ Matrix.rows x Matrix.columns ]
assign (
"separability.Matrix" ,
matrix (
numeric (0) ,
nrow = separability.Matrix.rows.number ,
ncol = separability.Matrix.columns.number ,
dimnames = separability.indices
) ,
envir = .GlobalEnv
)
# remove unnecessary objects
rm ( separability.Matrix.rows.number )
#rm ( separability.Matrix.columns.number )
rm ( separability.indices )
# names of columns derived from the data frames
if ( any ( colnames (Data.Frame.M1) == colnames (Data.Frame.M2) ) )
colnames (separability.Matrix) <- colnames (Data.Frame.M1) else cat ( "Warning: names of columns differ between the 2 data frames (while they should be the same?)." , "\n" , "\n" )
# suppress print-outs
sink ( "/dev/null" )
# get separability measures
separability.measures.class ( Data.Frame.M1 , Data.Frame.M2 )
# un-suppress print-outs
sink ()
# fill the Matrix
separability.Matrix [1, ] <- divergence.vectorS
separability.Matrix [2, ] <- transformed.divergence.vectorS
separability.Matrix [3, ] <- bh.distance.vectorS
separability.Matrix [4, ] <- jm.distance.vectorS
# ---------------------------------------------------------------------------/
# # make it available globally (for the next function): is this a bad idea?
assign (
"separability.Matrix.X" ,
separability.Matrix ,
envir = .GlobalEnv
)
# ---------------------------------------------------------------------------/
# write to csv file --- UNCOMMENT to write to file
# getwd()
# hardcoded: set _some_ working directory
setwd ( "~/R/data/separabilities/" )
# write to csv
write.csv (
round (separability.Matrix , digits = 3) ,
file = separability.filename.csv ,
eol = "\n",
)
# print the Matrix
round ( separability.Matrix , digits = 3 )
}
#< ----------------------------------------------------------------------- Code
# Comments
# The Jeffries-Matusita (and successively the Bhattacharryya) distance
# has been cross-checked with ERDAS Imagine's respective tool.
# The results are highly accurate. ERDAS "prefers" for some reason, _not_ to
# include all observations for the calculations of mean(s) and
# variance(s)-covariance(s)
-------------- next part --------------
# Custom function for various Separability Measures
# construct a bi-sample matrix comparing separabilities column-by-column... !@#?
# "rows = separability.indices" and "columns = ( dim (data.frame) [2] ) * 2"
# for samples derived from different "sources" <<< explain better please!!!
# by Nikos Alexandris, Freiburg, April 2010
# Code ---------------------------------------------------------------------- >
# requires 4 input data.frames
separability.matrices_multiref <- function ( Reference.Data.Frame.A ,
Target.Data.Frame.A ,
Reference.Data.Frame.B ,
Target.Data.Frame.B
) {
# suppress output -----------------------------------------------------------/
sink ( "/dev/null" )
# where
# Reference.Data.Frame.A -> reference.A
# Target.Data.Frame.A -> target A
# Reference.Data.Frame.B -> reference.B
# Target.Data.Frame.B -> target B
# names of rows --- pass as an argument to matrix() directly ---
separability.indices <- list ( c ( "Divergence" ,
"Transformed divergence" ,
"Bhattacharryya" ,
"Jeffries-Matusita"
)
)
# number of rows is "fixed" !
separability.Matrix.rows <- length ( unlist ( separability.indices ) )
# get separabilities between reference and each target
# source the "parent" function
source ( "~/R/code/functions/separability.matrix.R" )
# reference ~ target A
separability.matrix ( Reference.Data.Frame.A , Target.Data.Frame.A )
separability.Matrix.A <- separability.Matrix.X
# reference ~ target B
separability.matrix ( Reference.Data.Frame.B , Target.Data.Frame.B )
separability.Matrix.B <- separability.Matrix.X
# sort and print out results
# combine results in on matrix
assign ( "separability.Matrices",
matrix (
rbind (
separability.Matrix.A ,
separability.Matrix.B
) ,
ncol = dim (separability.Matrix.A) [2] * 2
)
)
# UNsuppress output ---------------------------------------------------------/
sink ()
# row names
rownames (separability.Matrices) <- unlist (separability.indices)
# print out
round ( separability.Matrices , digits = 3 )
}
#< ----------------------------------------------------------------------- Code
More information about the R-help
mailing list