[R] How to speed up interpolation

jim holtman jholtman at gmail.com
Sun Jul 17 22:16:19 CEST 2011


It would be nice if you had some sample data included so that we could
see how the code worked.  Have you use Rprof on the code to see where
you are spending your time?  You might want to use 'matrix' instead of
'data.frames' since there is a big performance impact with dataframes
when indexing.  A little more description of the problem you are
trying to solve would also be useful.  I tend to ask people "tell me
what you want to do, not how you want to do it".

On Sun, Jul 17, 2011 at 1:30 PM, James Rome <jamesrome at gmail.com> wrote:
> df is a very large data frame with arrival estimates for many flights
> (DF$flightfact) at random times (df$PredTime). The error of the estimate
> is df$dt.
> My problem is that I want to know the prediction error at each minute
> before landing. This code works, but is very slow, and dominates
> everything. I tried using split(), but that rapidly ate up my 12 GB of
> memory. So, is there a better R way of doing this?
>
> Thanks,
> Jim Rome
>
>    flights = table(df$flightfact[1:dim(df)[1], drop=TRUE])
>    nflights = length(flights)
>    flights = as.data.frame(flights)
>    times = data.frame()
>    # Split by flight
>    for(i in 1:nflights) {
>        tf = df[as.numeric(df$flightfact)==flights[i,1],]    # This flight
>        #check for at least 2 entries
>        if(dim(tf)[1] < 2) {
>            next
>        }
>        idf = interpolateTimes(tf)
>        times = rbind(times, idf)
>    }
>
> # Interpolate the times to every minute for 60 minutes
> # Return a new data frame
> interpolateTimes = function(df) {
>   x = as.numeric(seq(from=0,to=60)) # The times to interpolate to
>   dti = approx(as.numeric(df$PredTime), as.numeric(df$dt), x,
> method="linear",rule=1:1)
>   # Make a new data frame of interpolated values
>   idf = data.frame(time=dti$x, error=dti$y,
>       runway=rep(df$lrw[1],length(dti$x)),
> flight=rep(df$flightfact[1], length(dti$x)))
>   return(idf)
> }
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list