[R] Bus stop sequence matching problem
Charles Berry
ccberry at ucsd.edu
Sat Aug 30 20:00:24 CEST 2014
Adam Lawrence <alaw005 <at> gmail.com> writes:
>
> I am hoping someone can help me with a bus stop sequencing problem in R,
> where I need to match counts of people getting on and off a bus to the
> correct stop in the bus route stop sequence. I have tried looking
> online/forums for sequence matching but seems to refer to numeric
> sequences
> or DNA matching and over my head. I am after a simple example if anyone
> can
> please help.
>
Adam,
Yet another way...
See inline code. BTW, you should have mentioned that you are
a transit planner or included a signature block so folks would know this
is not a homework question.
As others have noted/hinted, there are some unstated assumptions, so
you need to try some test cases to be sure any solution always works.
You only have one outbound/inbound cycle in stop_onoff, right??
If not, I think almost any approach can fail given the right
sequence of 'seq's.
> I have two data series as per below (from database), that I want to
> combine. In this example “stop_sequence” includes the equence (seq) of bus
> stops and “stop_onoff” is a count of people getting on and off at certain
> stops (there is no entry if noone gets on or off).
>
> stop_sequence <- data.frame(seq=c(10,20,30,40,50,60),
> ref=c('A','B','C','D','B','A'))
> ## seq ref
> ## 1 10 A
> ## 2 20 B
> ## 3 30 C
> ## 4 40 D
> ## 5 50 B
> ## 6 60 A
> stop_onoff <-
> data.frame(ref=c('A','D','B','A'),on=c(5,0,10,0),off=c(0,2,2,6))
> ## ref on off
> ## 1 A 5 0
> ## 2 D 0 2
> ## 3 B 10 2
> ## 4 A 0 6
>
> I need to match the stop_onoff numbers in the right sto sequence, with the
> correctly matched output as follows (load is a cumulative count of on and
> off)
>
> desired_output <- data.frame(seq=c(10,20,30,40,50,60),
> ref=c('A','B','C','D','B','A'),
> on=c(5,'-','-',0,10,0),off=c(0,'-','-',2,2,6), load=c(5,0,0,3,11,5))
> ## seq ref on off load
> ## 1 10 A 5 0 5
> ## 2 20 B - - 0
> ## 3 30 C - - 0
> ## 4 40 D 0 2 3
> ## 5 50 B 10 2 11
> ## 6 60 A 0 6 5
>
Start here:
> stop_onoff$load <- with(stop_onoff,cumsum(on)-cumsum(off))
> split.ref <- with(stop_sequence,split(seq,ref))
> split.ref.onoff <- split.ref[as.character(stop_onoff$ref)]
> stop.mat <- sapply(split.ref.onoff,rep,length=2)
> inout <- cbind(stop.mat,c(0,Inf))>cbind(c(0,Inf),stop.mat)
> stop_onoff$seq <- head(stop.mat[inout],-1)
> merge(stop_sequence[c("ref","seq")],stop_onoff[-1],by="seq",all.x=T)
seq ref on off load
1 10 A 5 0 5
2 20 B NA NA NA
3 30 C NA NA NA
4 40 D 0 2 3
5 50 B 10 2 11
6 60 A 0 6 5
You can take care of turning the NA's to zeroes or '-'s, I think.
HTH,
Chuck
More information about the R-help
mailing list