[R] How to speed up interpolation

James Rome jamesrome at gmail.com
Mon Jul 18 00:58:57 CEST 2011


I thought I had included the data... Here it is again.

What I want to do is to make box and whisker plots with each flight
counted the same number of times in each time bin. Hence the
interpolation to minute time hacks.


On 7/17/2011 4:16 PM, jim holtman wrote:
> It would be nice if you had some sample data included so that we could
> see how the code worked.  Have you use Rprof on the code to see where
> you are spending your time?  You might want to use 'matrix' instead of
> 'data.frames' since there is a big performance impact with dataframes
> when indexing.  A little more description of the problem you are
> trying to solve would also be useful.  I tend to ask people "tell me
> what you want to do, not how you want to do it".
>
> On Sun, Jul 17, 2011 at 1:30 PM, James Rome <jamesrome at gmail.com> wrote:
>> df is a very large data frame with arrival estimates for many flights
>> (DF$flightfact) at random times (df$PredTime). The error of the estimate
>> is df$dt.
>> My problem is that I want to know the prediction error at each minute
>> before landing. This code works, but is very slow, and dominates
>> everything. I tried using split(), but that rapidly ate up my 12 GB of
>> memory. So, is there a better R way of doing this?
>>
>> Thanks,
>> Jim Rome
>>
>>    flights = table(df$flightfact[1:dim(df)[1], drop=TRUE])
>>    nflights = length(flights)
>>    flights = as.data.frame(flights)
>>    times = data.frame()
>>    # Split by flight
>>    for(i in 1:nflights) {
>>        tf = df[as.numeric(df$flightfact)==flights[i,1],]    # This flight
>>        #check for at least 2 entries
>>        if(dim(tf)[1] < 2) {
>>            next
>>        }
>>        idf = interpolateTimes(tf)
>>        times = rbind(times, idf)
>>    }
>>
>> # Interpolate the times to every minute for 60 minutes
>> # Return a new data frame
>> interpolateTimes = function(df) {
>>   x = as.numeric(seq(from=0,to=60)) # The times to interpolate to
>>   dti = approx(as.numeric(df$PredTime), as.numeric(df$dt), x,
>> method="linear",rule=1:1)
>>   # Make a new data frame of interpolated values
>>   idf = data.frame(time=dti$x, error=dti$y,
>>       runway=rep(df$lrw[1],length(dti$x)),
>> flight=rep(df$flightfact[1], length(dti$x)))
>>   return(idf)
>> }
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>



More information about the R-help mailing list