[R] Bus stop sequence matching problem

Charles Berry ccberry at ucsd.edu
Sat Aug 30 20:00:24 CEST 2014


Adam Lawrence <alaw005 <at> gmail.com> writes:

> 
> I am hoping someone can help me with a bus stop sequencing problem in R,
> where I need to match counts of people getting on and off a bus to the
> correct stop in the bus route stop sequence. I have tried looking
> online/forums for sequence matching but seems to refer to numeric 
> sequences
> or DNA matching and over my head. I am after a simple example if anyone 
> can
> please help.
> 

Adam,

Yet another way...

See inline code. BTW, you should have mentioned that you are
a transit planner or included a signature block so folks would know this
is not a homework question.

As others have noted/hinted, there are some unstated assumptions, so
you need to try some test cases to be sure any solution always works.

You only have one outbound/inbound cycle in stop_onoff, right??
If not, I think almost any approach can fail given the right
sequence of 'seq's.


> I have two data series as per below (from database), that I want to
> combine. In this example “stop_sequence” includes the equence (seq) of bus
> stops and “stop_onoff” is a count of people getting on and off at certain
> stops (there is no entry if noone gets on or off).
> 
> stop_sequence <- data.frame(seq=c(10,20,30,40,50,60),
> ref=c('A','B','C','D','B','A'))
> ##   seq ref
> ## 1  10   A
> ## 2  20   B
> ## 3  30   C
> ## 4  40   D
> ## 5  50   B
> ## 6  60   A
> stop_onoff <-
> data.frame(ref=c('A','D','B','A'),on=c(5,0,10,0),off=c(0,2,2,6))
> ##   ref on off
> ## 1   A  5   0
> ## 2   D  0   2
> ## 3   B 10   2
> ## 4   A  0   6
> 
> I need to match the stop_onoff numbers in the right sto sequence, with the
> correctly matched output as follows (load is a cumulative count of on and
> off)
> 
> desired_output <- data.frame(seq=c(10,20,30,40,50,60),
> ref=c('A','B','C','D','B','A'),
> on=c(5,'-','-',0,10,0),off=c(0,'-','-',2,2,6), load=c(5,0,0,3,11,5))
> ##   seq ref on off load
> ## 1  10   A  5   0    5
> ## 2  20   B  -   -    0
> ## 3  30   C  -   -    0
> ## 4  40   D  0   2    3
> ## 5  50   B 10   2   11
> ## 6  60   A  0   6    5
> 

Start here:

> stop_onoff$load <- with(stop_onoff,cumsum(on)-cumsum(off))
> split.ref <- with(stop_sequence,split(seq,ref))
> split.ref.onoff <- split.ref[as.character(stop_onoff$ref)]
> stop.mat <- sapply(split.ref.onoff,rep,length=2)
> inout <- cbind(stop.mat,c(0,Inf))>cbind(c(0,Inf),stop.mat)
> stop_onoff$seq <- head(stop.mat[inout],-1)
> merge(stop_sequence[c("ref","seq")],stop_onoff[-1],by="seq",all.x=T)
  seq ref on off load
1  10   A  5   0    5
2  20   B NA  NA   NA
3  30   C NA  NA   NA
4  40   D  0   2    3
5  50   B 10   2   11
6  60   A  0   6    5

You can take care of turning the NA's to zeroes or '-'s, I think.

HTH,

Chuck



More information about the R-help mailing list