--- title: "Generate a Departure Timetable" date: "2023-06-23" author: "Flavio Poletti" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Generate a Departure Timetable} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(tidytransit) library(dplyr) library(ggplot2) ``` In this vignette a departure timetable for a stop is generated and visualised. For some analysis it is important to know how and when a single stop is served and workflows to gather and plot such data can help with this analysis. ## Read GTFS data We use a feed from the New York Metropolitan Transportation Authority. It is provided as a sample feed with tidytransit but you can read it directly from the MTA's website. ```{r} local_gtfs_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") gtfs <- read_gtfs(local_gtfs_path) # gtfs <- read_gtfs("http://web.mta.info/developers/data/nyct/subway/google_transit.zip") ``` ## trip_origin and trip_headsign To display where a bus (or any public transit vehicle) is headed on a timetable we need the column `trip_headsign` in `gtfs$trips`. This is an optional field but our example feed provides this information. To display where a vehicle comes from on the timetable we need to create a new column in `gtfs$trips` which we'll call `trip_origin`. ```{r} # get the id of the first stop in the trip's stop sequence first_stop_id <- gtfs$stop_times %>% group_by(trip_id) %>% summarise(stop_id = stop_id[which.min(stop_sequence)]) # join with the stops table to get the stop_name first_stop_names <- left_join(first_stop_id, gtfs$stops, by="stop_id") # rename the first stop_name as trip_origin trip_origins <- first_stop_names %>% select(trip_id, trip_origin = stop_name) # join the trip origins back onto the trips gtfs$trips <- left_join(gtfs$trips, trip_origins, by = "trip_id") ``` ```{r} gtfs$trips %>% select(route_id, trip_origin) %>% head() ``` In case `trip_headsign` does not exist in the feed it can be generated similarly to `trip_origin`: ```{r} if(!exists("trip_headsign", where = gtfs$trips)) { # get the last id of the trip's stop sequence trip_headsigns <- gtfs$stop_times %>% group_by(trip_id) %>% summarise(stop_id = stop_id[which.max(stop_sequence)]) %>% left_join(gtfs$stops, by="stop_id") %>% select(trip_id, trip_headsign.computed = stop_name) # assign the headsign to the gtfs object gtfs$trips <- left_join(gtfs$trips, trip_headsigns, by = "trip_id") } ``` ## Create A Departure Time Table To create a departure timetable, we first need to find the ids of all stops in the stops table with the same same name, as `stop_name` might cover different platforms and thus have multiple stop_ids in the stops table. ```{r} stop_ids <- gtfs$stops %>% filter(stop_name == "Times Sq - 42 St") %>% select(stop_id) ``` Note that multiple unrelated stops can have the same `stop_name`, see `cluster_stops()` for examples how to find these cases. ## Trips departing from stop To the selected stop_ids for Time Square, we can join trip columns: `route_id`, `service_id`, `trip_headsign`, and `trip_origin`. Because stop_ids and trips are linked via the `stop_times` data frame, we do this by joining the stop_ids we've selected to the stop_times data frame and then to the trips data frame. ```{r} departures <- stop_ids %>% inner_join(gtfs$stop_times %>% select(trip_id, arrival_time, departure_time, stop_id), by = "stop_id") departures <- departures %>% left_join(gtfs$trips %>% select(trip_id, route_id, service_id, trip_headsign, trip_origin), by = "trip_id") ``` ## add route info (route_short_name) Each trip belongs to a route, and the route short name can be added to the departures by joining the trips data frame with `gtfs$routes`. ```{r} departures <- departures %>% left_join(gtfs$routes %>% select(route_id, route_short_name), by = "route_id") ``` Now we have a data frame that tells us about the origin, destination, and time at which each train departs from Times Square for every possible schedule of service. ```{r} departures %>% select(arrival_time, departure_time, trip_headsign,trip_origin, route_id) %>% head() %>% knitr::kable() ``` However, we don't know days on which these trips run. Using the service_id column on our calculated departures and tidytransit's calculated `dates_services` data frame, we can filter trips to a given date of interest. ```{r} head(gtfs$.$dates_services) ``` Please see the `servicepatterns` vignette for further examples on how to use this table. ## Extract a single day Now we are ready to extract the same service table for any given day of the year. For example, for August 23rd 2018, a typical weekday, we can filter as follows: ```{r fig.width=8, fig.height=12} services_on_180823 <- gtfs$.$dates_services %>% filter(date == "2018-08-23") %>% select(service_id) departures_180823 <- departures %>% inner_join(services_on_180823, by = "service_id") ``` How services and trips are set up depends largely on the feed. For an idea how to handle other dates and questions about schedules have a look at the `servicepatterns` vignette. ```{r} departures_180823 %>% arrange(departure_time, stop_id, route_short_name) %>% select(departure_time, stop_id, route_short_name, trip_headsign) %>% filter(departure_time >= hms::hms(hours = 7)) %>% filter(departure_time < hms::hms(hours = 7, minutes = 10)) %>% knitr::kable() ``` ## Simple plot We'll now plot all departures from Times Square depending on trip_headsign and route. We can use the route colors provided in the feed. ```{r fig.width=8, fig.height=6} route_colors <- gtfs$routes %>% select(route_id, route_short_name, route_color) route_colors$route_color[which(route_colors$route_color == "")] <- "454545" route_colors <- setNames(paste0("#", route_colors$route_color), route_colors$route_short_name) ggplot(departures_180823) + theme_bw() + geom_point(aes(y=trip_headsign, x=departure_time, color = route_short_name), size = 0.2) + scale_x_time(breaks = seq(0, max(as.numeric(departures$departure_time)), 3600), labels = scales::time_format("%H:%M")) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + theme(legend.position = "bottom") + scale_color_manual(values = route_colors) + labs(title = "Departures from Times Square on 08/23/18") ``` Now we plot departures for all stop_ids with the same name, so we can separate for different stop_ids. The following plot shows all departures for stop_ids 127N and 127S from 7 to 8 AM. ```{r fig.width=7, fig.height=5} departures_180823_sub_7to8 <- departures_180823 %>% filter(stop_id %in% c("127N", "127S")) %>% filter(departure_time >= hms::hms(hours = 7) & departure_time <= hms::hms(hours = 8)) ggplot(departures_180823_sub_7to8) + theme_bw() + geom_point(aes(y=trip_headsign, x=departure_time, color = route_short_name), size = 1) + scale_x_time(breaks = seq(7*3600, 9*3600, 300), labels = scales::time_format("%H:%M")) + scale_y_discrete(drop = F) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + theme(legend.position = "bottom") + labs(title = "Departures from Times Square on 08/23/18") + facet_wrap(~stop_id, ncol = 1) ``` Of course this plot idea can be expanded further. You could also differentiate each route by direction (using direction_id, headsign, origin or next/previous stops). Another approach is to calculate frequencies and show different levels of service during the day, all depending on the goal of your analysis.