[R] Need a faster function to replace missing data

Dieter Menneq dieter.menne at menne-biomed.de
Fri May 22 15:01:05 CEST 2009


Tim Clark <mudiver1200 <at> yahoo.com> writes:

> 
> I need some help in coming up with a function that will take two data sets,
determine if a value is missing in
> one, find a value in the second that was taken at about the same time, and
substitute the second value in for
> where the first should have been.  

This is the type of job I would do with a database, not R (alone). The
main advantage is that you have to do the cleanup job only once and can
retrieve the data in a rather well-documented way later (it's possible
with R, I know).

>> Put the 5 minutes data into one table. I would two new columns giving the 
delta to the next value for easier linear interpolation, but that's 
secondary. Make sure to index the table.

>> Put the 1 seconds data into another table, adding values rounded to
5 seconds, and giving these an index.

>>From R/ODBC or with RSQLite, make a Join of all values in Table 1 
that do have NULL values in the coordinates. If you do not want to 
do a linear interpolation, you could even do this within the database
and SQL alone. 

>> Compute the linear interpolation, and write the data back into
the database. If you want to be careful, you might mark the interpolated
values in a separate field as "computed"

When at a later time new data come in, you can run the procedure again
without penalty.

Dieter




More information about the R-help mailing list