[R] How to speed up interpolation

James Rome jamesrome at gmail.com
Sun Jul 17 19:30:21 CEST 2011

df is a very large data frame with arrival estimates for many flights
(DF$flightfact) at random times (df$PredTime). The error of the estimate
is df$dt.
My problem is that I want to know the prediction error at each minute
before landing. This code works, but is very slow, and dominates
everything. I tried using split(), but that rapidly ate up my 12 GB of
memory. So, is there a better R way of doing this?

Jim Rome

    flights = table(df$flightfact[1:dim(df)[1], drop=TRUE])
    nflights = length(flights)
    flights = as.data.frame(flights)
    times = data.frame()
    # Split by flight
    for(i in 1:nflights) {
        tf = df[as.numeric(df$flightfact)==flights[i,1],]    # This flight
        #check for at least 2 entries
        if(dim(tf)[1] < 2) {
        idf = interpolateTimes(tf)
        times = rbind(times, idf)

# Interpolate the times to every minute for 60 minutes
# Return a new data frame
interpolateTimes = function(df) {
   x = as.numeric(seq(from=0,to=60)) # The times to interpolate to
   dti = approx(as.numeric(df$PredTime), as.numeric(df$dt), x,
   # Make a new data frame of interpolated values
   idf = data.frame(time=dti$x, error=dti$y,
flight=rep(df$flightfact[1], length(dti$x)))

More information about the R-help mailing list