[R-sig-Geo] Why do kde estimates from ks and adehabitatHR packages in R produce different results? (with same bandwidth and grid)

Wed Feb 5 16:40:36 CET 2020

Hi All,

I have a series of VHF radio-telemetry data from animals that I am hoping
to be able analyze in R. I have done my research on the different packages
available for this, and I believe that utilizing the package ks to select a
plug-in bandwidth method for all individuals will be most appropriate,
based on previous research with my study species. This is reinforced in
section 4.3 of this spatial ecology book
<https://ecosystems.psu.edu/research/labs/walter-lab/manual/home-range-estimation/link-to-pdf>
and
in this paper by Millspaugh et al (2006)
<https://wildlife.onlinelibrary.wiley.com/doi/pdf/10.2193/0022-541X%282006%2970%5B384%3AAORSUU%5D2.0.CO%3B2>
.

I have written two R functions, initially to check that I got similar
results, and then hopefully select one to analyze the home ranges of my
study animals. My results from the two functions were not as similar as I
hoped, and that is what brings me to this forum - to figure out why. The
first function relies entirely on the ks package, by using the kde()
function and a pre-defined grid expanded by a factor of the bandwidth to
get the volume of the Utilization distribution (UD). The UD was then sorted
by high to low density. I calculated the size of a grid cell, then
calculated the cumulative sum of the grid cell volumes. I then took the
number of cells in this list needed to give a cumulative volume of 0.95 or
0.50 (depending on if I am interested in the 95% or 50% kde area) and then
multiplied that number of cells by the cell size to get the area. Perhaps
there is a flaw in my logic here that I am not seeing, if so please alert
me.

The annotated R code for that function is here:

kde_ks.hpi <- function(filename, percentage, gridsize, multiplier){
    data <- read.csv(file = filename)
    x <- as.data.frame(data$X)
    y <- as.data.frame(data$Y)
    loc <- cbind(x,y)
    data.h <-Hpi(loc)
    x.grid.size<- (gridsize) #Set grid size as no. of nodes in the x
direction
    band.mult<- (multiplier)
    x=seq(min(loc[,1])-band.mult*sqrt(data.h[1,1]), max(loc[,1]) +
band.mult *sqrt(data.h[1,1]), length.out=x.grid.size)
    y=seq(min(loc[,2])-band.mult*sqrt(data.h[2,2]), max(loc[,2])+
band.mult*sqrt(data.h[2,2]), by=(x[2]-x[1])) #sets out nodes in y axis,
spaced the same as x axis
    eval.pts <-expand.grid(x,y)
    UD <-kde(loc,H=data.h,eval.points=eval.pts)
    output<-
data.frame(cbind(UD$eval.points[,1],UD$eval.points[,2],UD$estimate))
    colnames(output)<- c("xcoord","ycoord","UD_ht")
    cell.area <- (x[2]-x[1])*(y[2]-y[1])
    grid.vol <- sum(cell.area*output[,3]) #check that grid UD vol is >=0.99
    vol<-output[,3]/sum(output[,3]) #standardize volume to add to 1
    output<-data.frame(cbind(output,vol))
    output<-output[order(-output$vol),] #sort by descending volume
    cumV<-cumsum(output$vol) #calculate cumulative UD volume
    output<-data.frame(cbind(output,cumV)) #add cumulative vol to output
    sub.UD<-subset(output,output$cumV<=(percentage/100)) #subset output by
percent vol
    cellcount <- nrow(sub.UD)
    area <- cellcount*cell.area
    KDE <- data.frame(c(area,grid.vol))
    KDE
    }

So as an example -
kde_ks.hpi("C01 .csv", percentage=50,gridsize =150,multiplier =3) #I will
attach a stripped down version of this data file if anyone is interested in
reproducing my results.

In this case - I found a 99% home range area of 3958 meter^2 for this
individual.

My second function is largely based on code that I found from the Spatial
Ecology textbook that I referenced above. This code transforms the
bandwidth and kde from the ks package with rasters to utilize functions in
the adehabitatHR function to easily get UD volumes and areas. The
difference between my code and the one in the book is that I define the
grid using the same methods as the first function. Yet, I don't get the
same results, despite using the same grid and bandwidth as I did in the
previous function. Below is the annotated code for that function:

CRS.SC <http://crs.sc/> <- CRS("+init=epsg:32616") #define study area
projection first

kde_ade.ks.hpi <- function(filename, percentage, gridsize, multiplier){
  data <- read.csv(file = filename) #read in data
  x <- as.data.frame(data$X)
  y <- as.data.frame(data$Y)
  xy <- c(x,y)
  loc <- cbind(x,y)
  data.h <-Hpi(loc) #define bandwidth using plug-in method
  data.proj <- SpatialPointsDataFrame(xy,data, proj4string = CRS.SC
<http://crs.sc/>)
  boundingVals <- data.proj using bbox #get the bounding values of the animal
locations
  band.mult <- (multiplier) #define the value used to expand the evaluation
grid
  y.expand <- band.mult*sqrt(data.h[2,2]) #set the value used to expand the
grid by a function of the multiplier and the bandwidth
  x.expand <- band.mult*sqrt(data.h[1,1]) #note that this expansion is
different in the x and y direction
  deltaX <- as.integer(((boundingVals[1,2]) - (boundingVals[1,1])) +
(2*x.expand)) #get the total length of the grid axis
  deltaY <- as.integer(((boundingVals[2,2]) - (boundingVals[2,1])) +
(2*y.expand))
  x.grid.size <- (gridsize) #set the number of grid nodes in the X direction
  gridRes <- deltaX/x.grid.size #determine the grid resolution, (i.e the
size of one side of a cell)
  y.grid.size <- deltaY/gridRes #determine the number of nodes in the Y
direction using the same cell size as the x axis
  boundingVals[2,1] <- boundingVals[2,1] - y.expand #min Y - expand the in
each direction grid by the previously determined value
  boundingVals[2,2] <- boundingVals[2,2] + y.expand #max Y
  boundingVals[1,1] <- boundingVals[1,1] - x.expand #min X
  boundingVals[1,2] <- boundingVals[1,2] + x.expand #max X
  gridTopo <- GridTopology((boundingVals[,1]),
c(gridRes,gridRes),c(x.grid.size,y.grid.size)) #Grid Topology object is
basis for sampling grid (offset, cellsize, dim)
  sampGrid <- SpatialGrid(gridTopo, proj4string = CRS.SC
<http://crs.sc/>) #Using
the Grid Topology and projection create a SpatialGrid class
  sampSP <- as(sampGrid, "SpatialPixels") #Cast over to Spatial Pixels
  sampRaster <- raster(sampGrid) #convert the SpatialGrid class to a raster
  sampRaster[] <- 1 #set all the raster values to 1 such as to make a data
mask
  evalPoints <- xyFromCell(sampRaster, 1:ncell(sampRaster)) #Get the center
points of the mask raster with values set to 1
  hpikde <- kde(x=loc, H=data.h, eval.points=evalPoints) #Create the KDE
using the evaluation points
  hpikde.raster <- raster(sampRaster) #Create a template raster based upon
the mask and then assign the values from the kde to the template
  hpikde.raster <- setValues(hpikde.raster,hpikde$estimate)
  hpikde.px <- as(hpikde.raster,"SpatialPixelsDataFrame") #Cast over to
SPxDF
  hpikde.ud <- new("estUD", hpikde.px) #create new estUD using the SPxDF
  hpikde.ud using vol = FALSE #Assign values to a couple slots of the estUD
  hpikde.ud using h$meth = "Plug-in Bandwidth"
  hpikde.ud.vol <- getvolumeUD(hpikde.ud, standardize=TRUE) #Convert the UD
values to volume using getvolumeUD from adehabitatHR and cast over to a
raster
  hpikde.ud.vol.raster <- raster(hpikde.ud.vol)
  hpikde.vol <- getverticeshr(hpikde.ud, percent = percentage,ida = NULL,
unin = "m", unout = "m2", standardize=TRUE) #Here we generate volume
contours using the UD
  hpikde.vol$area #Determine UD area at that contour
}

So for example:
> kde_ade.ks.hpi(filename = "C01 .csv",
percentage=50,gridsize=150,multiplier=3)
[1] 3965.43

So in this case - I found a 50% kde home range area of 3965 meter^2 for
this individual. Not really that big a difference from the other value of
3958 m^2- but I would love to know where it comes from, as it the
difference between the two functions can vary based on how I define the
grid, multiplier, or percentage. At 99% the same data set produces a value
of 67182 m^2 and 67453 m^2 for the first and second methods, respectively.
In this case the second method produces a larger value both times, but when
I try this with a different animal, the first method gives me larger areas.
What is going on here, why is there a difference? Is one method better than
another? What do you all recommend for moving forward?

I am hoping to use these functions and alter them to produce shapefiles
that I can then map in GIS, once my methodology is settled.

Thank you very much for your time,

Brad Nissen

*--May the forest be with you! *

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20200205/3e26c5a6/attachment.html>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: C01 .csv
Type: application/vnd.ms-excel
Size: 2081 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20200205/3e26c5a6/attachment.xlb>