[R] Bivariate kernel density bandwidth selection

Glen Sargeant gsargeant at usgs.gov
Thu Dec 9 19:43:10 CET 2010

I've been trying to implement bivariate kernel density estimation.  For data
like mine, function "kde" from package "ks" with bandwidth matrix derived by
function "Hscv" seems like a very good choice.  Unfortunately, Hscv seems
unmanageably slow except for very small sample sizes (up to a few hundred)
and my sample sizes are quite large (up to a few thousand).  I've reviewed
help files, vignettes, previous postings on this list, and the JSS paper
describing ks and haven't found much mention of constraints on sample size
other than using kfold cross-validation to speed calculation:unfortunately,
that option is listed but not enabled for Hscv.

An example illustrates my problem.  Each of the following expressions
returns the time elapsed to estimate a bandwidth matrix.  The first is for a
sample of 100 x and y coordinates, the second is for a sample of 200 x and y

>     system.time(Hscv(x=xy.100))
   user  system elapsed 
   1.97    0.03    2.00 

>     system.time(Hscv(x=xy.200))
   user  system elapsed 
   6.03    0.17    6.22 

I have to do this many, many times and each run will involve up to several
thousand records, so you can see my problem.

I should think that others must surely have encountered and overcome this
challenge.  If anyone can kindly point me in a productive direction, I will
be most grateful.

Glen Sargeant
Research Wildlife Biologist
View this message in context: http://r.789695.n4.nabble.com/Bivariate-kernel-density-bandwidth-selection-tp3080753p3080753.html
Sent from the R help mailing list archive at Nabble.com.

More information about the R-help mailing list