[R] Identifying peaks (or offsets) in a time series

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Tue Jul 25 13:21:29 CEST 2006


On 25-Jul-06 Ulrik Stervbo wrote:
> Could Petr Pikal's peaks function
> (http://finzi.psych.upenn.edu/R/Rhelp02a/archive/33097.html)
> be of any use?
> 
> Ulrik
> 
> On 7/24/06, Tauber, Dr E. <et22 at leicester.ac.uk> wrote:
>> Dear R-users,
>>
>> We are monitoring the activity of animals during a few days
>> period. The data from each animal (crossing of infra-red beam)
>> are collected as a time series (in 30 min bins). An example is
>> attached below.
>>
>> y <-
>> c(0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,3,28,27,46,76,77,60,
>> 19,35,55,59,48,87,20,38,82,62,60,85,105,69,109,102,100,101,
>> 116,126,119,63,27,25,15,8,0,0,3,0,0,3,0,0,5,3,0,0,6,1,29,
>> 73,56,56,57,92,34,51,30,76,30,38,47,87,22,0,68,76,94,101,
>> 119,114,115,111,116,134,125,76,23,19,30,2,8,0,3,0,0,0,7,
>> 0,0,0,0,4,0,7,0,21,4,49,51,56,43,55,55,34,48,16,0,61,22,
>> 94,63,102,47,100,96,113,93,109,123,120,124,115,94,96,76,36,3,
>> 0,0,0,0,0,0,2,5,0,0,0,0,2,10,33,34,15,0,47,22,20,33,52,4,
>> 41,45,0,21,18,38,32,21,78,82,72,102,103,118,116,118,114,82,
>> 18,5,21,4,0,14,0,5,2,0,0,2,2,0,0,3,0,2,7,16,13,17,50,0,
>> 48,16,19,34,39,33,3,67,0,68,34,65,84,61,100,85,108,124,
>> 141,139,134,96,54,91,54,12,0,0,0,0,0,0,0,0,0,0,0,4,11,
>> 0,19,27,15,12,20)
>>
>> We would like to have an automatic way, using R, to identify
>> the time point of offset of each bout of activity (i.e. when
>> activity goes down to a minimum value, for a defined duration).
>> In the example above the offset times (the element number)
>> should be approximately: 53, 99, 146, 191, 239, 283, 330
>> (the last bout of activity can be ignores).
>>
>> Any help or advice will be greatly appreciated,
>>
>> Many thanks, Eran

First of all, I find only 255 values in your 'y' above, so I'm
not sure whether your "53, 99, 146, 191, 239, 283, 330" is
referring to the same series. So this may not be helpful in
interpreting what you mean by "goes down to a minimum value
for a defined duration".

To interpret that clearly, one needs a specified value for
"minimum value" and a specified value for "defined duration".

You can use R to see what happens in the series by methods
such as the following.

You can plot the series (with 24-hour indicators) with

  t0<-c(  0, 48, 96,144,192,240)
  y0<-c(  0,  0,  0,  0,  0,  0)
  y1<-c(140,140,140,140,140,140)
  plot(y)
  lines(y)
  for(i in (1:6)){lines(c(t0[i],t0[i]),c(y0[i],y1[i]),col="green")}

Adopting for illustration "minimum value" = 0,

  Z <- 1*(y==0) ## a series of 0 and 1, Z=1 whenever y=0

  Runs<-rle(Z)  ## "run length encoding" function -- see ?rle
  R.L<-Runs$lengths
  R.V<-Runs$values
  R.L[R.V==1]   ## gives the lengths of runs of "y=0"
## [1]  5 12  2  2  2  2  1  1  3  4
##[11]  1  1  1  6  4  1  1  1  1  2
##[21]  2  1  1  1 11  1

So you can see that available values for "defined duration"
in the series are:

  1 (13 times), 2 (6), 3 (1), 4 (2), 5 (1), 6 (1), 11 (1), 12 (1)

Next, there is clearly an overall 48-point periodicity (which
of course corresponds to a 48*30min = 24-hour periodicity), and
there are 5 complete 48-point segments in your series:

  1:48, 49:96, 97:144, 145:192, 192:240

with a trailing incomplete segment of 15 points. Overall (see
the plot) there are 6 stretches of "very low" activity every
24 hours, so from the above you could think you can obtain these
by choosing "defined duration" = 4 (any run of 0 of length at
least 4 is a "period of very low activity", since there are
2 (=4) + 1 (=5) + 1 (=6) + 1 (=11) + 1 (=12) = 6 of these. But,
as you can see from the plot, this will pick out wrong segments!

This is because the second period of "very low" activity is 0's
broken by relatively frequent positive values.

So you need to raise your threshold of "activity" a bit. Now
you're venturing into "groping" territory -- since the nice
clean "minimum value" = 0 doesn't work, you have to find one
which does.

If you go up to "minimum value = 5", you get:

  Z <- 1*(y<=5); Runs<-rle(Z); R.L<-Runs$lengths; R.V<-Runs$values
  R.L[R.V==1]
## [1] 19 12  1  1  1  5  6  1  1  1
##[11] 14  1  1  1  1  2 12  1  1  1 12  1

which better captures the 6 periods, except that the third one
is split (5,6). Only when you go up to 7 do you finally get

  Z <- 1*(y<=7); Runs<-rle(Z); R.L<-Runs$lengths; R.V<-Runs$values
  R.L[R.V==1]
## [1] 19 14  1  1 14  1  1 14  1  1  1  1  2 13  1  1  1 12  1

and now you can find your 6 "low activity" periods where you
have 19,14,14,14,13,12.

So, in terms of "activity goes down to a minimum value for a
defined period", you need (for this data set) "minimum value"
to be at least 7, and "defined period" could be anything at
most 12 but greater than 2; in this case therefore, for instance,
you can choose say 8 for the latter.

To really locate these periods in the original series, you now
need (say)

  R.L
## [1] 19 32 14 15  1 15  1  1 14  1
##[11]  1  9  1 19 14  4  1  5  1  2
##[21]  1 16  1  1  2  1 13  4  1  6
##[31]  1  1  1 17 12  1  1  5

  ix<-(R.V==1)&(R.L>8) ##(Activity<="min val"=7)&(runlength>8)
  1*ix
## [1]  1  0  1  0  0  0  0  0  1  0
##[11]  0  0  0  0  1  0  0  0  0  0
##[21]  0  0  0  0  0  0  1  0  0  0
##[31]  0  0  0  0  1  0  0  0

  (1-ix)
## [1]  0  1  0  1  1  1  1  1  0  1
##[11]  1  1  1  1  0  1  1  1  1  1
##[21]  1  1  1  1  1  1  0  1  1  1
##[31]  1  1  1  1  0  1  1  1

So "inactivity" is defined by 1*ix==1, and "activity" by (1-ix)==1.

You can now match "inactivity" to the original series using the
"rep" function:

  inact<-rep(1*ix,R.L)

and check this by adding lines to the plot:

  lines(inact,col="blue")

(where the high value "100" corresponds to inactivity).
Now anything you want to find out about the periods of "inactivity"
(as defined above) ccan be determined from the sequence "inact".

However, it is clear that this approach (which is primarily
motivated by trying to clarify your statement of your aims
in terms of the data you supplied) is clearly unlikely to be
useful when you have many such datasets, since you will have
to go into "groping" mode for each dataset, and are likely to
get many different answers over your several datasets.

At this point, I think, it is becoming clear that you need to
think about what you *really" mean by "actvity", and about
how to model it in terms which can be carried over across
datasets.

There is a fundamental point about how your data represent
real activity -- to what extent is there scope for the animal
to be active without breaking the light beam? -- but I'll leave
that one aside.

Taking the 24 hours (48*30min) to run from midnight at the
start of the series, there is an apparent cycle of activity
which, overall, consists of inactivity until somewhere in the
(approximate) time range 06:30-09:00, then an increasing trend
until a maximum which occurs between 18:00 and 23:00, followed by
a rapid decline to a very low level which takes place over some
2-4 hours.

You can get a numerical overview of this variation from day to
day by using commands like

  y[1:48]
  y[(1:48)+48*1]
  y[(1:48)+48*2]
  y[(1:48)+48*3]
  y[(1:48)+48*4]
  y[(1:48)+48*5]

It is tempting to try to model this rise and fall by a mathematical
curve. To the extent that this well represents the initial stage
of the rise, and also the final stage of the decline, you could
then solve that curve for the point at which you decide that
activity starts and ends.

However, it is apparent from the plot that the rising phase, at
least, is not simple. In most cases there is a "midday peak" over
approx 10:00-14:00 (of variable duration) followed by a brief
"siesta", then the sharp rise to the late night peak.

Also, it seems from this dataset that the "day" of this animal
is a bit shorter than 24 hours, since the major peak shifts
steadily from 23:00 on the first day to 19:00 on the 5th day,
i.e. 1 hour per day. In addition, the magnitude of the "midday
peak" decreases steadily over the days, while its distinctiveness
tends to increase. Apparently, there is some progressive factor in
operation which influences activity.

So (and I can't comment on whether such things are intrinsic to
the behaviour of your animal or have been brought about by your
experimental setup) a detailed modelling of the activity cycle
would need to bring such comsiderations into the model. How to
do this, only you can say!

Hoping that this helps!
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 25-Jul-06                                       Time: 12:21:24
------------------------------ XFMail ------------------------------



More information about the R-help mailing list