[R] Time series data with dropouts/gaps

Tue Oct 26 14:00:23 CEST 2010

On 10/25/2010 09:37 PM, Gabor Grothendieck wrote:
> On Tue, Oct 26, 2010 at 12:28 AM, Bob Cunningham<FlyMyPG at gmail.com>  wrote:
>    
>> I have time-series data from a pair of inexpensive self-logging 3-axis
>> accelerometers (http://www.gcdataconcepts.com/xlr8r-1.html).  Since I'm not
>> sure of the vibration/shock spectrum I'm measuring, for my initial sensor
>> characterization run the units were mounted together with the sample rate
>> set to the maximum of 640 samples/sec.
>>
>> Unfortunately, at this sample rate there are significant data dropouts at
>> various scales (a phenomenon not present at data rates of 160 Hz and below):
>>
>> 1. Approximately every 20ms, a few samples are dropped (believed to be due
>> to internal buffer wrapping).
>>
>> 2. Approximately every 200ms, about 50 samples are dropped (believed to be
>> due to flash write times).
>>
>> 3. At seemingly random intervals, a sample will appear with an out-of-order
>> timestamp (vendor is diagnosing).
>>
>> Initially, I'm trying to answer the following questions:
>>
>> A. How well do the 2 units compare?  (Calibration, time-base drift, etc.)
>>
>> B. Can I use a lower sample rate?  (What is the observed spectrum?)
>>
>> I started attacking the problem in Python (numpy/scipy), where I've done
>> lots of prior time-series sensor data analysis.  Unfortunately, the gaps
>> have made direct use of the data futile, and I found I was spending all my
>> time manipulating Python lists and numpy vectors rather than finding
>> answers.
>>
>> I hope R can help calm my sea of unruly data.  I'm presently working my way
>> through the abundant R references (tutorials, wiki, etc.), but I was hoping
>> to find pointers here to help me become productive sooner rather than later.
>>
>> Here's my present brute-force plan of attack:
>>
>> - Load both data sets (in CSV format).  Each data element is a timestamp +
>> 3-axis acceleration.
>> - Determine timebase offset: The unit clocks don't match perfectly, and the
>> units were started at slightly different times, so I expect to correlate
>> common events in the data.
>> - Find all overlapping data clusters (between superset of gaps).
>> - See if I have enough data to perform spectral analysis.  I'd like to
>> analyze all clusters together, but I suspect I may have to analyze them
>> independently, then combine the results.
>>
>> Thoughts?  Hints?
>>
>>      
> You can use read.zoo in the zoo package to create a zoo time series
> from a csv file.  The zoo merge method can merge two or more series
> together and na.locf, na.approx or na.spline, also in zoo, could be
> used to fill in the NAs.  There are three vignettes (pdf documents)
> that come with the zoo package that will get you up to speed.
>    

Wow, Zoo looks great!  (It also looks to be a great source of future 
questions from me...)

Many, many thanks,

-BobC