[R] run a calculation function over time fields, ordered and grouped by variables
Jim Lemon
drjimlemon at gmail.com
Tue May 26 00:43:36 CEST 2015
Hi gavinr,
Perhaps this will do what you want.
add_HH_MM<-function(x) {
t1bits<-strsplit(as.character(x$tst),":")
t2bits<-strsplit(as.character(x$tint),":")
hours<-as.numeric(lapply(t1bits,"[",1))+cumsum(as.numeric(lapply(t2bits,"[",1)))
minutes<-as.numeric(lapply(t1bits,"[",2))+cumsum(as.numeric(lapply(t2bits,"[",2)))
next_hour<-minutes > 59
# adjust for running into the next hour
minutes[next_hour]<-minutes[next_hour]-60
hours[next_hour]<-hours[next_hour]+1
# adjust for running into the next day
hours[hours > 23]<-hours[hours > 23]-24
return(paste(formatC(hours,width=2,flag=0),formatC(minutes,width=2,flag=0),sep=":"))
}
df$tarr<-unlist(by(df,df$rtid,add_HH_MM))
Jim
On Tue, May 26, 2015 at 5:28 AM, gavinr <g.rudge at bham.ac.uk> wrote:
> I’ve got some transit data relating to bus stops for a GIS data set. Each
> row represents one stop on a route. For each record I have the start time
> of the route, a sequence in which a bus stops, the time the bus arrives at
> the first stop and the time taken to get to each of the stops from the last
> one in the sequence. Not all sequences of stops starts with the number 1,
> some may start with a higher number.
> I need to make a new variable which has the time the bus arrives at each
> stop by using the start time from the stop with the lowest sequence number,
> to populate all of the arrival times for each stop in each route.
>
> I have a very simple example below with just three routes and a few stops in
> each. My actual data set has a few million rows. I've also created a
> version of the data set I'm aiming to get.
>
> There are two problems here. Firstly getting the data into the correct
> format to do the calculations with
> durations, and secondly running a function over the data set to obtain the
> times.
> It is the durations that are critical not the date, so using the POSIX
> methods doesn’t really seem appropriate here. Ultimately the times are
> going to be used in a route solver in an ArcSDE geodatabase. I tried to use
> strptime to format my times, but could not get them into a data.frame as
> presumably they are a list. In this example I’ve left them as strings.
>
> Any help is much appreciated.
>
> #create four columns with route id, stop sequence interval time and route
> start time
> ssq<-c(3,4,5,6,7,8,9,1,2,3,4,2,3,4,5,6,7,8)
> tint<-c("00:00","00:12","00:03","00:06","00:09","00:02","00:04","00:00","00:08","00:10","00:10","00:00","00:02","00:04","00:08","00:02","00:01","00:01")
> tst<-c(rep("18:20",7),rep("10:50",4),rep("16:15",7))
> rtid<-c(rep("a",7),rep("b",4),rep("c",7))
> df<-data.frame(cbind(ssq,tint,tst,rtid))
> df
>
> #correct data set should look like this
> tarr<-c("18:20","18:32","18:35","18:41","18:50","18:52","18:56","10:50","10:58","11:08","11:18","16:15","16:17","16:21","16:29","16:31","16:32","16:33")
> df2<-cbind(df,tarr)
> df2
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/run-a-calculation-function-over-time-fields-ordered-and-grouped-by-variables-tp4707655.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list