[R] lokern package

Martin Maechler maechler at stat.math.ethz.ch
Thu Jul 2 18:34:43 CEST 2009


>>>>> "RV" == Ravi Varadhan <RVaradhan at jhmi.edu>
>>>>>     on Thu, 2 Jul 2009 10:51:03 -0400 writes:

    RV> Dear Martin, I have been playing a lot with the
    RV> glkerns() function in the "lokern" package for
    RV> "automatic" smoothing of time-series data.  This kernel
    RV> smoothing approach of Gasser and Mueller seems to
    RV> perform quite well for estimating the function and its
    RV> derivatives (first and second derivatives).  In fact,
    RV> this is one of the best methods based on my simulation
    RV> studies for comparing a number of "automatic" bandwidth
    RV> selection methods.  

That's quite interesting! 
What methods did you use in your comparison?
[I assume you did use  smooth.spline() as well]

    RV> I am interested in applying this to
    RV> automatic smoothing and feature extraction for a "large"
    RV> number (thousands) of time series, with hundreds of
    RV> points per time series.  This is where I am interested
    RV> in seeing if the efficiency of "glkerns" can be
    RV> improved.
 
    RV> Here follows my specific question:
 
    RV> You have to call glkerns() separately each time to
    RV> compute the function and its derivatives, ie. if I want
    RV> the function and its first 2 derivatives, I have to make
    RV> 3 calls to glkerns().  

Yes.  Note however that each of these calls chooses a different
kernel order ('korder') by default { korder <- deriv + 2 },
and uses *different* automatic bandwidths depending on both
deriv and korder.

Consequently, the result of	     glkerns(x,y, deriv=1)  is 
*not* the derivative of the estimate glkerns(x,y, deriv=0)
but rather an independent estimate of the derivative of the
unknown g(). The same applies for deriv=2 etc.

But the problem is even "worse": Let's assume you get the "optimal" global
bandwidth estimate 'bandwidth' = h  from the deriv=0 case.
Then you could set this bandwidth explicitly for the deriv=1 and
deriv=2 calls {and you'd gain quite a bit of execution time !}.
*HOWEVER*, as the internal codes absolutely require
 
    k_{ord} - nu  ==  korder - nu  to be an *even* number,

you can *not* keep 'korder' fixed together with 'bandwidth'.
But if you change  'korder', the kernels change and the meaning
of the bandwidth has changed too.

I have just uploaded version 1.0-8 of package 'lokern' to CRAN,
and this now contains a demo("gkl-derivs") which defines a utility function
to play a bit with this (keeping bandwidth fixed).

    RV> This seems to me to be inefficient, especially for large
    RV> time series.  In smooth.spline(), for example, you can
    RV> call it once to get the fit, and then use the fitted
    RV> object to compute the derivatives using predict().  

Note that smooth.spline() is very different:
If you use deriv=1 or deriv=2 in the predict() call,
you get exactly the 1st or 2nd derivative of the same smoothing
spline function.

    RV> Is it possible to have a similar feature in glkerns?

As I have explained, this is not easily possible currently
more for conceptual reasons than others.

In principle it should be possible however,
but that would both need some "theoretical" work
{maybe just collecting the papers and formulae used} and 
then some Fortran code digging and shuffling.

Would be a nice "semester work" for a student...
Martin

--
Martin <Maechler at stat.math.ethz.ch>  http://stat.ethz.ch/people/maechler
Seminar für Statistik, ETH Zürich  HG G 16      Rämistrasse 101
CH-8092 Zurich, SWITZERLAND
phone: +41-44-632-3408       fax: ...-1228      <><

 
    RV> Thanks for any suggestions.

    RV> Ravi.
    RV> ----------------------------------------------------------------------------
    RV> -------

    RV> Ravi Varadhan, Ph.D.

    RV> Assistant Professor, The Center on Aging and Health

    RV> Division of Geriatric Medicine and Gerontology

    RV> Johns Hopkins University

    RV> Ph: (410) 502-2619

    RV> Fax: (410) 614-9625

    RV> Email: rvaradhan at jhmi.edu

    RV> Webpage:
    RV> http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
    RV> tml

 

    RV> ----------------------------------------------------------------------------
    RV> --------

 

    RV> 	[[alternative HTML version deleted]]

    RV> ______________________________________________
    RV> R-help at r-project.org mailing list
    RV> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
    RV> read the posting guide
    RV> http://www.R-project.org/posting-guide.html and provide
    RV> commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list