[R] Time series data with dropouts/gaps
Gabor Grothendieck
ggrothendieck at gmail.com
Tue Oct 26 06:37:05 CEST 2010
On Tue, Oct 26, 2010 at 12:28 AM, Bob Cunningham <FlyMyPG at gmail.com> wrote:
> I have time-series data from a pair of inexpensive self-logging 3-axis
> accelerometers (http://www.gcdataconcepts.com/xlr8r-1.html). Since I'm not
> sure of the vibration/shock spectrum I'm measuring, for my initial sensor
> characterization run the units were mounted together with the sample rate
> set to the maximum of 640 samples/sec.
>
> Unfortunately, at this sample rate there are significant data dropouts at
> various scales (a phenomenon not present at data rates of 160 Hz and below):
>
> 1. Approximately every 20ms, a few samples are dropped (believed to be due
> to internal buffer wrapping).
>
> 2. Approximately every 200ms, about 50 samples are dropped (believed to be
> due to flash write times).
>
> 3. At seemingly random intervals, a sample will appear with an out-of-order
> timestamp (vendor is diagnosing).
>
> Initially, I'm trying to answer the following questions:
>
> A. How well do the 2 units compare? (Calibration, time-base drift, etc.)
>
> B. Can I use a lower sample rate? (What is the observed spectrum?)
>
> I started attacking the problem in Python (numpy/scipy), where I've done
> lots of prior time-series sensor data analysis. Unfortunately, the gaps
> have made direct use of the data futile, and I found I was spending all my
> time manipulating Python lists and numpy vectors rather than finding
> answers.
>
> I hope R can help calm my sea of unruly data. I'm presently working my way
> through the abundant R references (tutorials, wiki, etc.), but I was hoping
> to find pointers here to help me become productive sooner rather than later.
>
> Here's my present brute-force plan of attack:
>
> - Load both data sets (in CSV format). Each data element is a timestamp +
> 3-axis acceleration.
> - Determine timebase offset: The unit clocks don't match perfectly, and the
> units were started at slightly different times, so I expect to correlate
> common events in the data.
> - Find all overlapping data clusters (between superset of gaps).
> - See if I have enough data to perform spectral analysis. I'd like to
> analyze all clusters together, but I suspect I may have to analyze them
> independently, then combine the results.
>
> Thoughts? Hints?
>
You can use read.zoo in the zoo package to create a zoo time series
from a csv file. The zoo merge method can merge two or more series
together and na.locf, na.approx or na.spline, also in zoo, could be
used to fill in the NAs. There are three vignettes (pdf documents)
that come with the zoo package that will get you up to speed.
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
More information about the R-help
mailing list