[R] Cross-checking a custom function for separability indices

Wed May 5 04:58:13 CEST 2010

Nikos:
> Hi list!
> 
> I have prepared a custom function (below) in order to calculate
> separability indices (Divergence, Bhattacharyya, Jeffries-Matusita,
> Transformed divergene) between two samples of (spectral land cover)
> classes.

I've found a mistake: initially I used "log10" instead of "log" in the 
definition of the Bhatacharryya index.

Thanks to a friend we cross-compared results of the (corrected) custom 
function(s) (posted here) and the results derived from ERDAS Imagine (a 
proprietary remote sensing box) and they agree concerning the Bhattacharyya 
index (and successively the Jeffries-Matusita index). Not sure about the 
Divergence (and successively the Transformed Divergence) though.

There is a subtle difference worthwhile to mention:

ERDAS Imagine rejected an observation (when calculating the "mean" of the 
sample(s)). One observation less due to... "pixel inclusion/exclusion" rules? 
No idea. Needless to say, using R one has total control (given he knows what 
is going on).

Also, I've expanded this mini personal project by writing more functions in 
order to get an output that fits my current needs.  Functions are attached as 
".R" files and explanations (along with some questions of course) are given 
below [*]. Some things are hardcoded since I have little experience on writing 
generic functions.

Would be nice if someone is interested in this and can provide assistance to 
make it better, more generic and useful.

Thanks, Nikos

---
> # two samples
> sample.1 <- c (1362, 1411, 1457, 1735, 1621, 1621, 1791, 1863, 1863, 1838)
> sample.2 <- c (1354, 1458, 1458, 1458, 1550, 1145, 1428, 1573, 1573, 1657)

> # running the custom function (below)
> separability.measures ( sample.1 , sample.2 )

> Divergence:             1.5658
> Bhattacharryya:         0.1683
> Jeffries-Matusita:      0.3098
> Transformed divergence: 0.3555

after correction the results are:

Divergence:             1.5658 
Transformed divergence: 0.3555 
Bhattacharryya:         0.1805 
Jeffries-Matusita:      0.3302 

---
[*] Custom functions for separability measures. Note: the working directory, 
location(s) for results export (as csv) as well as the location of each script 
that is to be sourced are hard-coded.

1. separability.measures <- function ( Vector.1 , Vector.2 )
  comments:
  - converts input vectors to matrices

2. separability.measures.info <- function (
                                      Data.Frame.1 ,
                                      Data.Frame.2 ,
                                      Data.Frame.1.label ,
                                      Data.Frame.2.label ,
                                                       i ,
                                      print = FALSE
                                      )

  comments:
  - calls "separability.measures()" and prints out more info about what is 
compared

3. separability.measures.class <- function (
                                        Data.Frame.1 ,
                                        Data.Frame.2 ,
                                        print = FALSE 
                                        )

  comments:
  - calls "separability.measures.info.R()"
  - calculates separability measures between samples in "data.frames" with 
identical structure

4. separability.matrix <- function (Data.Frame.M1,Data.Frame.M2)
  comments:
  - calls "separability.measures.class()"
  - constructs a matrix where
    - rows are separability indices 
    - columns equal the "dim (data.frame) [2]" (any of the identical input 
data.frames) and hold... well, the separability measures!

5.  separability.matrices <- function ( Reference.Data.Frame ,
                                    Target.Data.Frame.A ,
                                    Target.Data.Frame.B
                                  )

  comments:
  - calls "separability.matrix()"
  - print out something that is meant to ease off reading/comparing the 
results _in case_ one wants to compare differences between one (call it) 
reference sample and two (call them) target samples, e.g. calculate indices 
between:
    - sample.1 and sample.2
    - sample.1 and sample.3
    - compare the above results

(
The output of "separability.matrix()" displays the separability measures on a 
per row-basis ( each row holds results for one index) where:

  - in the 1st column are the results (separabilities) for "sample.1 vs. 
sample.2", for variable 1 of the (identically structured) input data.frames

  - in the 2nd column are the ones for "sample.1 vs. sample.3" for variable 1 
of the (identically structured) input data.frames

  - in the 3rd column are the ones for "sample.1 vs. sample.2" for variable 2 
of the (identically structured) input data.frames

 - etc.
)

  - what is missing is a mechanism that gives proper names for the columns. I 
imagine(d) names like (for column 1) "sample.1 vs sample.2, variable1", 
"sample.1 vs sample.3, variable 1" , "sample.1 vs sample.2, variable 2", etc. 
But (a) I am not sure how to implement it and (b) they are _too_ long. So they 
just are numbered as usual [1] [2] [3]...

6. separability.matrices_multiref <- function ( Reference.Data.Frame.A ,
                                    Target.Data.Frame.A ,
                                    Reference.Data.Frame.B ,
                                    Target.Data.Frame.B 
                                  )

  comments:
  - same as function 5 (above) but accepts 4 different samples (as data.frames 
of course).

Questions:

If you made it read this post till here, maybe you could give hints on how to 
handle:

1. the definition of the directory path that leads to the custom functions 
that are required to be source(d)? Currently hard-coded at a specific 
directory where I store custom functions. Is it better to make 1 _big_ script 
which includes all functions? What is good coding practice in such occasions ?

2. "column naming" after a column by column merging of three different 
matrices (data.frames)? This concerns "separability.matrices()".

3. the long command built-up using "assign()" and "else()" ? For some reason 
the functions fails to run if the "else(...)" statement is moved in the next 
line. For this reason the script "separability.matrix.R" does not respect the 
"78 characters" width limit.
-------------- next part --------------
# Custom function for various Separability Measures
  # by Nikos Alexandris, Freiburg, 8.04.2010

# ( based on Divergence and Jeffries-Matusita, requires input variables "as.matrices" )
separability.measures <- function ( Vector.1 , Vector.2 ) {

    # convert vectors to matrices in case they are not
    Matrix.1 <- as.matrix (Vector.1)
    Matrix.2 <- as.matrix (Vector.2)

    # define means
    mean.Matrix.1 <- mean ( Matrix.1 )
    mean.Matrix.2 <- mean ( Matrix.2 )

    # define difference of means
    mean.difference <- mean.Matrix.1 - mean.Matrix.2

    # define covariances for supplied matrices
    cv.Matrix.1 <- cov ( Matrix.1 )
    cv.Matrix.2 <- cov ( Matrix.2 )

    # define the halfsum of cv's as "p"
    p <- ( cv.Matrix.1 + cv.Matrix.2 ) / 2

# --%<------------------------------------------------------------------------
    # calculate the Bhattacharryya index
    bh.distance <- 0.125 *
                         t ( mean.difference ) *
                         p^ ( -1 ) *
                         mean.difference +
                         0.5 * log (
                                     det ( p ) / sqrt (
                                                      det ( cv.Matrix.1 ) *
                                                      det ( cv.Matrix.2 )
                                                      )
                                     )

# --%<------------------------------------------------------------------------
    # calculate Jeffries-Matusita
    # following formula is bound between 0 and 2.0
    jm.distance <- 2 * ( 1 - exp ( -bh.distance ) )

    # also found in the bibliography:
    # jm.distance <- 1000 * sqrt (   2 * ( 1 - exp ( -bh.distance ) )   )
    # the latter formula is bound between 0 and 1414.0

# --%<------------------------------------------------------------------------
    # calculate the divergence
      # trace (is the sum of the diagonal elements) of a square matrix
      trace.of.matrix <- function ( SquareMatrix ) {
                     sum ( diag ( SquareMatrix ) ) }

      # term 1
      divergence.term.1 <- 1/2 *
                 trace.of.matrix (
                                  ( cv.Matrix.1 - cv.Matrix.2 ) *
                                  ( cv.Matrix.2^ (-1) - cv.Matrix.1^ (-1) )
                                 )

      # term 2
      divergence.term.2 <- 1/2 *
                 trace.of.matrix (
                                  ( cv.Matrix.1^ (-1) + cv.Matrix.2^ (-1) ) *
                                  ( mean.Matrix.1 - mean.Matrix.2 ) *
                                  t ( mean.Matrix.1 - mean.Matrix.2 )
                                 )

      # divergence
      divergence <- divergence.term.1 + divergence.term.2

# --%<------------------------------------------------------------------------
    # and the transformed divergence
    transformed.divergence  <- 2 * ( 1 - exp ( - ( divergence / 8 ) ) )

# --%<------------------------------------------------------------------------
    # print results --- look at "separability.measures.pp"

    # get "column" names (hopefully they exist) --- to use for/with... ??? ---
    name.of.Matrix.1 <- colnames ( Matrix.1 )
    name.of.Matrix.2 <- colnames ( Matrix.2 )

    # create new objects to be used with... ?
    assign ( "divergence.vector"             , divergence             ,
 envir = .GlobalEnv)

    assign ( "jm.distance.vector"            , jm.distance            ,
 envir = .GlobalEnv)

    assign ( "bh.distance.vector"            , bh.distance            ,
 envir = .GlobalEnv )

    assign ( "transformed.divergence.vector" , transformed.divergence ,
 envir = .GlobalEnv )

    # print some message --- ?
    # a "just do it" solution using cat()
    cat ( paste ( "Divergence:             " ,
 round ( divergence , digits = 4 ) , sep = "" ) , "\n")

    cat ( paste ( "Transformed divergence: " ,
 round ( transformed.divergence , digits = 4 ) , sep = "" ) , "\n" )

    cat ( paste ( "Bhattacharryya:         " ,
 round ( bh.distance , digits = 4 ) , sep = "" ) , "\n")

    cat ( paste ( "Jeffries-Matusita:      " ,
 round ( jm.distance , digits = 4 ) , sep = "" ) , "\n")

    cat ("\n")

}

# --%<------------------------------------------------------------------------
# Sources

  # Richards, John A., Jia, X., "Remote Sensing Digital Image Analysis: ...", Springer
  # Wikipedia

# --%<------------------------------------------------------------------------
# Details

  # Bhattacharyya distance measures the similarity of two discrete or
  # continuous probability distributions.
# (from: http://en.wikipedia.org/wiki/Bhattacharyya_distance)

  # Divergence [in vector calculus] is an operator that measures the magnitude
  # of a vector field's source or sink at a given point...
# (from: http://en.wikipedia.org/wiki/Divergence)

  # Jeffries-Matusita ...
# add info...

  # Transformed divergence ...
# add info...

# --%<------------------------------------------------------------------------
# Acknowledgements

  # Augustin Lobo for initial help with the Bhattacharryya Index
  # Nikos Koutsias 
  # useRs
  # many more people...
-------------- next part --------------
# Custom function for various Separability Measures
# separability measures between   data.frames   with identical structure
  # by Nikos Alexandris, Freiburg, April 2010

# Code ---------------------------------------------------------------------- >

separability.measures.class <- function (
                                        Data.Frame.1 ,
                                        Data.Frame.2 ,
                                        print = FALSE 
                                        ) {

# where to look for custom (sub-)functions?
setwd ("~/R/code/functions/")

# DEPENDS on:
  # separability.measures.info()
source ("separability.measures.info.R")

    # create labels from names of input(ted) data.frames
    Data.Frame.1.label <- deparse ( substitute ( Data.Frame.1 ) )
    Data.Frame.2.label <- deparse ( substitute ( Data.Frame.2 ) )

    # create new vectors to carry results that will be fed in the final matrix
    assign ( "divergence.vectorS" ,
 numeric(0) , envir = .GlobalEnv )

    assign ( "transformed.divergence.vectorS" ,
 numeric(0) , envir = .GlobalEnv )

    assign ( "bh.distance.vectorS" ,
 numeric(0) , envir = .GlobalEnv )

    assign ( "jm.distance.vectorS" ,
 numeric(0) , envir = .GlobalEnv )

  # loop over number of columns of the data.frame(s)
  for ( i in 1 : dim ( Data.Frame.1 ) [2] )

    # it does not matter which data.frame  ( 1st or 2nd ) is used
    # they should be of the same structure otherwise its pointless

  # call "separability.measures.info" which calls "separability.measures"
  separability.measures.info (
                             Data.Frame.1 ,
                             Data.Frame.2 ,
                             Data.Frame.1.label ,
                             Data.Frame.2.label ,
                             i
                             )

}

#< ----------------------------------------------------------------------- Code
-------------- next part --------------
# Custom function for various Separability Measures
# construct a bi-sample matrix comparing separabilities column-by-column... !@#?
# "rows = separability.indices" and "columns = ( dim (data.frame) [2] ) * 2"
  # by Nikos Alexandris, Freiburg, April 2010

# Code ---------------------------------------------------------------------- >

# requires 3 input data.frames
separability.matrices <- function ( Reference.Data.Frame ,
                                    Target.Data.Frame.A ,
                                    Target.Data.Frame.B
                                  ) {

# suppress output -----------------------------------------------------------/
sink ( "/dev/null" )

# where
  # Data.Frame.1 -> reference
  # Data.Frame.2 -> target A
  # Data.Frame.2 -> target B

# UNCOMMENT below to be used for writing results to a csv file -- -- -- -- --

  # function name --- to be used for the exported csv filename
  separability.function.name <- sys.call (1)
#print ( separability.function.name )

   separability.function.name <- gsub ( " " , "_" , separability.function.name )
#print ( separability.function.name )

  separability.function.name <- gsub ( "," , "_" , separability.function.name )
#print ( separability.function.name )

  # filename
  separability.filename <- paste (
                                     c (separability.function.name) ,
                                     collapse = "_"
                                     )
#print ( separability.filename )

  separability.filename.csv <- paste ( separability.filename ,
                                     ".csv" ,
                                     sep = ""
                                     )
#print ( separability.filename.csv )

# -- -- -- -- UNCOMMENT above to be used for writing results to a csv file --

  # names of rows --- pass as an argument to matrix() directly ---
  separability.indices <- list ( c ( "Divergence" ,
                                     "Transformed divergence" ,
                                     "Bhattacharryya" ,
                                     "Jeffries-Matusita"
                                   )
                               )
# number of rows is "fixed" !
  separability.Matrix.rows <- length ( unlist ( separability.indices ) )

# create empty matrix to carry
#  assign (
#          "separability.Matrix.X" ,
#          matrix (
#                 numeric (0) ,
#                 nrow =  separability.Matrix.rows ,
#                 ncol = dim ( Reference.Data.Frame) [2] * 2 ,
#                 ) ,
#          envir = .GlobalEnv
#          )
# is it there?
# print ( separability.Matrix.X )

# get separabilities between reference and each target

  # source the "parent" function
  source ( "~/R/code/functions/separability.matrix.R" )

  # reference ~ target A
  separability.matrix ( Reference.Data.Frame , Target.Data.Frame.A )
  separability.Matrix.A <- separability.Matrix.X

  # reference ~ target B
  separability.matrix ( Reference.Data.Frame , Target.Data.Frame.B )
  separability.Matrix.B <- separability.Matrix.X

  # combine results in on matrix
  assign ( "separability.Matrices",

            matrix (
                     rbind (
                             separability.Matrix.A ,
                             separability.Matrix.B
                           ) ,
                     ncol = dim (separability.Matrix.A) [2] * 2
                   )
         )

# UNsuppress output ---------------------------------------------------------/
sink ()

  # row names
  rownames (separability.Matrices) <- unlist (separability.indices)

# write to csv
  write.csv (
            round (separability.Matrices , digits = 3) ,
            file = separability.filename.csv ,
            eol = "\n",
            )

  # print out
  round ( separability.Matrices , digits = 3 )

}

#< ----------------------------------------------------------------------- Code

# Sources

  # combine matrices column by column
  # <http://promberger.info/linux/2007/07/09/r-combine-two-matrices-column-by-column/comment-page-1/#comment-37240>
-------------- next part --------------
# Custom function for various Separability Measures
# more info --- used within a for() loop
  # by Nikos Alexandris, Freiburg, April 2010

# Code ---------------------------------------------------------------------- >

separability.measures.info <- function (
                                      Data.Frame.1 ,
                                      Data.Frame.2 ,
                                      Data.Frame.1.label ,
                                      Data.Frame.2.label ,
                                                       i ,
                                      print = FALSE
                                      ) {

# where to find custom (sub-)function(s)
setwd ( "~/R/code/functions/" )

# DEPENS on:
  # separability.measures()
source ("separability.measures.R")

  # pretty print ( is it really pretty? )
  cat (
        "Separability measures between" ,
        "\n" , "\n" , "  - \"" , colnames ( Data.Frame.1 ) [i] ,
        "\"  of  \"" , Data.Frame.1.label , "\"" ,
        "\n" , "        and" ,
        "\n" , "  - \"" , colnames ( Data.Frame.2 ) [i] ,
        "\"  of  \"" , Data.Frame.2.label , "\"" ,
        sep = ""
      )

  # add empty (new)line
  cat (
        "\n" , "\n"
      )

  # calculate separability measures
  separability.measures ( Data.Frame.1 [i] , Data.Frame.2 [i] )

  # fill the required vectors for the final matrix
  assign ( "divergence.vectorS"             ,
         append ( divergence.vectorS , divergence.vector ) ,
 envir = .GlobalEnv )

  assign ( "transformed.divergence.vectorS" ,
         append ( transformed.divergence.vectorS , transformed.divergence.vector) ,
 envir = .GlobalEnv )

  assign ( "bh.distance.vectorS"            ,
         append ( bh.distance.vectorS , bh.distance.vector ) ,
 envir = .GlobalEnv )

  assign ( "jm.distance.vectorS"            ,
         append ( jm.distance.vectorS , jm.distance.vector ) ,
 envir = .GlobalEnv )

  # print something to ease off visual separation of output
  cat (
        "--%<---" ,
        "\n"
      )

}

#< --------------------------------------------------------------------- Code #
-------------- next part --------------
# Custom function for various Separability Measures
# construct a matrix with:
# "rows = separability.indices" and "columns = dim (data.frame) [2]"
  # by Nikos Alexandris, Freiburg, April 2010

# Code ---------------------------------------------------------------------- >

separability.matrix <- function (Data.Frame.M1,Data.Frame.M2) {

# where to look for custom (sub-)functions?
setwd ( "~/R/code/functions/" )

# DEPENDS on: separability.measures.class()
source ( "separability.measures.class.R" , local = FALSE )

# UNCOMMENT below to be used for writing results to a csv file ---------------

  # function name --- to be used for the exported csv filename
  separability.function.name <- sys.call (1)
#print ( separability.function.name )

   separability.function.name <- gsub ( " " , "_" , separability.function.name )
#print ( separability.function.name )

  separability.function.name <- gsub ( "," , "_" , separability.function.name )
#print ( separability.function.name )

  # filename
  separability.filename <- paste (
                                     c (separability.function.name) ,
                                     collapse = "_"
                                     )
#print ( separability.filename )

  separability.filename.csv <- paste ( separability.filename ,
                                     ".csv" ,
                                     sep = ""
                                     )
#print ( separability.filename.csv )

# ------------ UNCOMMENT above to be used for writing results to a csv file --

  # names of rows --- pass as an argument to matrix() directly ---
  separability.indices <- list ( c ( "Divergence" ,
                                     "Transformed divergence" ,
                                     "Bhattacharryya" ,
                                     "Jeffries-Matusita"
                                   )
                               )

  # number of rows is "fixed" !
  separability.Matrix.rows.number <- length ( unlist ( separability.indices ) )

  # number of columns depends on the data frames
  if ( dim (Data.Frame.M1) [2] == ( dim (Data.Frame.M2) [2] ) )

      assign ( "separability.Matrix.columns.number" , dim (Data.Frame.M1) [2] , envir = .GlobalEnv ) else cat ( "Warning: number of columns differs between the 2 data frames (while it should be the same?)." , "\n" )

  # construct a matrix of [ Matrix.rows x Matrix.columns ]
  assign (
         "separability.Matrix" ,
         matrix (
                numeric (0) ,
                nrow = separability.Matrix.rows.number ,
                ncol = separability.Matrix.columns.number ,
                dimnames = separability.indices
                ) ,
         envir = .GlobalEnv
         )

  # remove unnecessary objects
  rm ( separability.Matrix.rows.number )
  #rm ( separability.Matrix.columns.number )
  rm ( separability.indices )

  # names of columns derived from the data frames
  if ( any ( colnames (Data.Frame.M1) == colnames (Data.Frame.M2) ) )

  colnames (separability.Matrix) <- colnames (Data.Frame.M1) else cat ( "Warning: names of columns differ between the 2 data frames (while they should be the same?)." , "\n" , "\n" )

  # suppress print-outs
  sink ( "/dev/null" )

  # get separability measures
  separability.measures.class ( Data.Frame.M1 , Data.Frame.M2 )

  # un-suppress print-outs
  sink ()

  # fill the Matrix
  separability.Matrix [1, ] <- divergence.vectorS
  separability.Matrix [2, ] <- transformed.divergence.vectorS
  separability.Matrix [3, ] <- bh.distance.vectorS
  separability.Matrix [4, ] <- jm.distance.vectorS

# ---------------------------------------------------------------------------/
#   # make it available globally (for the next function): is this a bad idea?
  assign (
         "separability.Matrix.X" ,
         separability.Matrix ,
         envir = .GlobalEnv
         )
# ---------------------------------------------------------------------------/

  # write to csv file   --- UNCOMMENT to write to file
  # getwd()

  # hardcoded: set _some_ working directory
  setwd ( "~/R/data/separabilities/" )

  # write to csv
  write.csv (
            round (separability.Matrix , digits = 3) ,
            file = separability.filename.csv ,
            eol = "\n",
            )

  # print the Matrix
  round ( separability.Matrix , digits = 3 )

}

#< ----------------------------------------------------------------------- Code

# Comments

  # The Jeffries-Matusita (and successively the Bhattacharryya) distance
  # has been cross-checked with ERDAS Imagine's respective tool.

  # The results are highly accurate. ERDAS "prefers" for some reason, _not_ to
  # include all observations for the calculations of mean(s) and 
  # variance(s)-covariance(s)
-------------- next part --------------
# Custom function for various Separability Measures
# construct a bi-sample matrix comparing separabilities column-by-column... !@#?
# "rows = separability.indices" and "columns = ( dim (data.frame) [2] ) * 2"
# for samples derived from different "sources" <<< explain better please!!!
  # by Nikos Alexandris, Freiburg, April 2010

# Code ---------------------------------------------------------------------- >

# requires 4 input data.frames
separability.matrices_multiref <- function ( Reference.Data.Frame.A ,
                                    Target.Data.Frame.A ,
                                    Reference.Data.Frame.B ,
                                    Target.Data.Frame.B 
                                  ) {

# suppress output -----------------------------------------------------------/
sink ( "/dev/null" )

# where
  # Reference.Data.Frame.A -> reference.A
  # Target.Data.Frame.A -> target A
  # Reference.Data.Frame.B -> reference.B
  # Target.Data.Frame.B -> target B

  # names of rows --- pass as an argument to matrix() directly ---
  separability.indices <- list ( c ( "Divergence" ,
                                     "Transformed divergence" ,
                                     "Bhattacharryya" ,
                                     "Jeffries-Matusita"
                                   )
                               )
# number of rows is "fixed" !
  separability.Matrix.rows <- length ( unlist ( separability.indices ) )

# get separabilities between reference and each target

  # source the "parent" function
  source ( "~/R/code/functions/separability.matrix.R" )

  # reference ~ target A
  separability.matrix ( Reference.Data.Frame.A , Target.Data.Frame.A )
  separability.Matrix.A <- separability.Matrix.X

  # reference ~ target B
  separability.matrix ( Reference.Data.Frame.B , Target.Data.Frame.B )
  separability.Matrix.B <- separability.Matrix.X

# sort and print out results

  # combine results in on matrix
  assign ( "separability.Matrices",

            matrix (
                     rbind (
                             separability.Matrix.A ,
                             separability.Matrix.B
                           ) ,
                     ncol = dim (separability.Matrix.A) [2] * 2
                   )
         )

# UNsuppress output ---------------------------------------------------------/
sink ()

  # row names
  rownames (separability.Matrices) <- unlist (separability.indices)

  # print out
  round ( separability.Matrices , digits = 3 )

}

#< ----------------------------------------------------------------------- Code