[R] Approximate Entropy?

Tue Dec 23 17:25:59 CET 2008

Stephan Kolassa wrote:
> 
> Dear guRus,
> 
> is there a package that calculates the Approximate Entropy (ApEn) of a 
> time series?
> 
> RSiteSearch only gave me a similar question in 2004, which appears not 
> to have been answered:
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/28830.html
> 
> RSeek.org didn't yield any results at all.
> 
> Happy holidays (where appropriate),
> Stephan
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

I went ahead and translated D. Kaplan's MATLAB code for this purpose into R.
Warning: it's barely tested, and I'm not sure I understand what it's doing
-- you're on your own from here ...
(it won't work if you simply cut and paste what's here, since the examples
are above the definitions
of the utility functions etc.)

## http://www.macalester.edu/~kaplan/hrv/doc/funs/apen.html

## Approximate Entropy
## Syntax

## entropy = apen( pre, post, r );

## Arguments
## pre 	An embedding of data.
## post 	The images of the data in the embedding.
## r 	The filter factor, which sets the length scale over which to compute
the approximate entropy.
## Returned Values
## entropy 	The numerical value of the approximate entropy.
## Description

## The "approximate entropy" was introduced by Pincus to quantify the
## creation of information in a time series. A low value of the
## entropy indicates that the time series is deterministic; a high
## value indicates randomness.

## The "filter factor" r is an important parameter. In principle, with
## an infinite amount of data, it should approach zero. With finite
## amounts of data, or with measurement noise, it is not always clear
## what is the best value to choose. Past work on heart rate
## variability has suggested setting r to be 0.2 times the standard
## deviation of the data.

## Another important parameter is the "embedding dimension." Again,
## there is know precise means of knowing the best such dimension, but
## previous work has used a dimension of 2. The final parameter is the
## embedding lag, which is often set to 1, but perhaps more
## appropriately is set to be the smallest lag at which the
## autocorrelation function of the time series is close to zero.

## The apen function expects the data to be presented in a specific
## format. Working with a time series tseries, the following steps
## will compute the approximate entropy, with an embedding dimension
## of 2 and a lag of 1.

## edim = 2;
## lag = 1;
## edata = lagembed(tseries,edim,lag);
## [pre,post] = getimage(edata,lag);
## r = 0.2*std(tseries);
## apen(pre,post,r);

edim <- 2
lag <- 1
edata <- lagembed(tseries,edim,lag)
im <- getimage(edata,lag)
r <- 0.2*sd(tseries)
apen(im$pre,im$post,r)

apenembed <- function(tseries,edim,lag=1,relr=0.2,r) {
  edata <- lagembed(tseries,edim,lag)
  im <- getimage(edata,lag)
  if (missing(r)) r <- relr*sd(tseries)
  apen(im$pre,im$post,r)
}

## References

##     * SM Pincus (1991) Proc. Natl. Acad. Sci. USA 88:2297-2301
##     * D Kaplan, MI Furman, SM Pincus, SM Ryan, LA Lipsitz, AL Goldberger
(1991) "Aging and the complexity of cardiovascular dynamics," Biophys.J.
59:945-949 

## See Also
## apenhr. lagembed. getimage.
## Examples

lagembed <- function(ts,d,lag) {
  z <- embed(ts,d)
  z[seq(1,nrow(z),by=lag),]
}

getimage <- function(x,pred) {
  pre <- x[1:(nrow(x)-pred),]
  post <- x[(pred+1):nrow(x),1]
  list(pre=pre,post=post)
}

## tseries = randn(500,1);
## should have a large approximate entropy.

set.seed(1001)
apenembed(rnorm(500),edim=3)
## 0.3240493

apenembed(sin(seq(1,100,by=0.2)),edim=3)

## should have a small approximate entropy since it is deterministic. 
## 0.1001811

## examples from utils web page:
ts <- 1:5
x <- lagembed(ts,3,1)
im <- getimage(x,1)

apen <- function(pre,post,r) {

  ## translated from Kaplan D (1998) HRV software.
  ## http://www.macalester.edu/~kaplan/hrv/doc/Feb3snap.tar
  ## computer approximate entropy a la Steve Pincus

  N <- nrow(pre)
  p <- ncol(pre)

  ## number of pairs of points closer than r in pre/post space
  phiM <- phiMplus1 <- 0
  for (k in 1:N) {
    ## replicate the current point
    foo <- matrix(rep(pre[k,],N),byrow=TRUE,nrow=N)
    ## calculate distance (max norm)
    goo <- abs(foo-pre)<=r
    ## which ones of them are closer than r using the max norm?
    closerpre <- if (p==1) goo else apply(goo,1,all)
    precount <- sum(closerpre)
    phiM <- phiM+log(precount)

    ## of the ones that were closer in the pre space, how many are closer
    ## in post also ?
    postcount <- sum(abs(post[closerpre] - post[k])<r)
    phiMplus1 <- phiMplus1 + log(postcount)
    ## cat(k,precount,phiM,postcount,phiMplus1,"\n")
  }
  (phiM-phiMplus1)/N
}

## also see:
## http://www.cbi.dongnocchi.it/glossary/ApEn.html
## http://www.physionet.org/physiotools/ApEn/

ts <- rep(61:65,10)
apenembed(ts,d=5,r=2) ## get 0 instead of 0.00189?

-- 
View this message in context: http://www.nabble.com/Approximate-Entropy--tp21144062p21147242.html
Sent from the R help mailing list archive at Nabble.com.