opendatatoronto

R build status AppVeyor build status Codecov test coverage CRAN status Lifecycle: stable CRAN RStudio mirror downloads

opendatatoronto is an R interface to the City of Toronto Open Data Portal. The goal of the package is to help read data directly into R without needing to manually download it via the portal.

For more information, please visit the package website and vignettes:

Installation

You can intall the released version of opendatatoronto from CRAN:

install.packages("opendatatoronto")

or the development version from GitHub with:

devtools::install_github("sharlagelfand/opendatatoronto", ref = "main")

Usage

In the Portal, datasets are called packages. You can see a list of available packages by using list_packages(). This will show metadata about the package, including what topics (i.e. tags) the package covers, any civic issues it addresses, a description of it, how many resources there are (and their formats), how often it is is refreshed and when it was last refreshed.

library(opendatatoronto)
packages <- list_packages(limit = 10)
packages
#> # A tibble: 10 × 11
#>    title            id    topics civic_issues publisher excerpt dataset_category
#>    <chr>            <chr> <chr>  <chr>        <chr>     <chr>   <chr>           
#>  1 Licensed Dogs a… lice… "Comm… NULL         Municipa… "The r… Table           
#>  2 Multi-Tenant (R… mult… "Perm… NULL         Municipa… "This … Table           
#>  3 Polls conducted… 7bce… "City… NULL         City Cle… "Polls… Table           
#>  4 Rain Gauge Loca… f293… "c(\"… NULL         Toronto … "This … Document        
#>  5 Sidewalk Constr… side… "Tran… NULL         Transpor… "The C… Map             
#>  6 Traffic Signal … 7dda… "Tran… Mobility     Transpor… "This … Document        
#>  7 Daily Shelter &… 21c8… "c(\"… NULL         Toronto … "Daily… Table           
#>  8 Traffic Volumes… traf… "Tran… Mobility     Transpor… "This … Table           
#>  9 Toronto Island … toro… "Tran… NULL         Parks, F… "This … Table           
#> 10 Toronto Open Da… open… "City… NULL         Informat… "This … Table           
#> # ℹ 4 more variables: num_resources <int>, formats <chr>, refresh_rate <chr>,
#> #   last_refreshed <date>

You can also search packages by title:

ttc_packages <- search_packages("ttc")

ttc_packages
#> # A tibble: 15 × 11
#>    title            id    topics civic_issues publisher excerpt dataset_category
#>    <chr>            <chr> <chr>  <chr>        <chr>     <chr>   <chr>           
#>  1 TTC Subway Shap… c01c… "NULL" "NULL"       Toronto … "This … Document        
#>  2 TTC Ridership A… ef35… "Tran… "Mobility"   Toronto … "This … Document        
#>  3 TTC Routes and … 7795… "Tran… "NULL"       Toronto … "Data … Document        
#>  4 TTC Subway Dela… 996c… "Tran… "NULL"       Toronto … "TTC S… Document        
#>  5 TTC Bus Delay D… e271… "Tran… "NULL"       Toronto … "TTC B… Document        
#>  6 TTC Streetcar D… b68c… "Tran… "NULL"       Toronto … "TTC S… Document        
#>  7 TTC BusTime Rea… 31ed… "Tran… "Mobility"   Toronto … "This … Document        
#>  8 TTC Real-Time N… 8217… "Tran… "NULL"       Toronto … "The N… Document        
#>  9 TTC  - Ridershi… 2c4c… "c(\"… "c(\"Fiscal… Toronto … "This … Website         
#> 10 TTC - Monthly R… d2a7… "Tran… "NULL"       Toronto … "This … Website         
#> 11 TTC - Average W… 4b80… "Tran… "NULL"       Toronto … "This … Website         
#> 12 TTC Annual Pass… 1444… "Tran… "Mobility"   Toronto … "This … Website         
#> 13 TTC - Annual Pa… aedd… "Tran… "Mobility"   Toronto … "This … Website         
#> 14 TTC Ridership -… 4eb6… "Tran… "NULL"       Toronto … "This … Document        
#> 15 TTC Ridership -… d9dc… "Tran… "Mobility"   Toronto … "This … Document        
#> # ℹ 4 more variables: num_resources <int>, formats <chr>, refresh_rate <chr>,
#> #   last_refreshed <date>

Or see metadata for a specific package:

show_package("996cfe8d-fb35-40ce-b569-698d51fc683b")
#> # A tibble: 4 × 11
#>   title             id    topics civic_issues publisher excerpt dataset_category
#>   <chr>             <chr> <chr>  <chr>        <chr>     <chr>   <chr>           
#> 1 TTC Subway Delay… 996c… Trans… <NA>         Toronto … TTC Su… Document        
#> 2 TTC Subway Delay… 996c… Trans… <NA>         Toronto … TTC Su… Document        
#> 3 TTC Subway Delay… 996c… Trans… <NA>         Toronto … TTC Su… Document        
#> 4 TTC Subway Delay… 996c… Trans… <NA>         Toronto … TTC Su… Document        
#> # ℹ 4 more variables: num_resources <int>, formats <chr>, refresh_rate <chr>,
#> #   last_refreshed <date>

Within a package, there are a number of resources - e.g. CSV, XSLX, JSON, SHP files, and more. Resources are the actual “data”.

For a given package, you can get a list of resources using list_package_resources(). You can pass it the package id (which is contained in marriage_license_packages below):

marriage_licence_packages <- search_packages("Marriage Licence Statistics")

marriage_licence_resources <- marriage_licence_packages %>%
  list_package_resources()

marriage_licence_resources
#> # A tibble: 4 × 4
#>   name                                  id                  format last_modified
#>   <chr>                                 <chr>               <chr>  <date>       
#> 1 Marriage Licence Statistics Data      4d985c1d-9c7e-4f74… CSV    2025-04-01   
#> 2 Marriage Licence Statistics Data.csv  01dff98a-b56b-4237… CSV    2025-04-01   
#> 3 Marriage Licence Statistics Data.xml  41148040-e29d-4a02… XML    2025-04-01   
#> 4 Marriage Licence Statistics Data.json 620da420-89be-4227… JSON   2025-04-01

But you can also get a list of resources by using the package’s URL from the Portal:

list_package_resources("https://open.toronto.ca/dataset/sexual-health-clinic-locations-hours-and-services/")
#> # A tibble: 2 × 4
#>   name                                                id    format last_modified
#>   <chr>                                               <chr> <chr>  <date>       
#> 1 sexual-health-clinic-locations-hours-and-services-… 7076… XLSX   2019-08-15   
#> 2 Sexual-health-clinic-locations-hours-and-services-… 5af8… XLSX   2019-08-15

Finally (and most usefully!), you can download the resource (i.e., the actual data) directly into R using get_resource():

marriage_licence_statistics <- marriage_licence_resources %>%
  head(1) %>%
  get_resource()

marriage_licence_statistics
#> # A tibble: 558 × 4
#>    `_id` CIVIC_CENTRE MARRIAGE_LICENSES TIME_PERIOD
#>    <int> <chr>                    <int> <chr>      
#>  1 19231 ET                          80 2011-01    
#>  2 19232 NY                         136 2011-01    
#>  3 19233 SC                         159 2011-01    
#>  4 19234 TO                         367 2011-01    
#>  5 19235 ET                         109 2011-02    
#>  6 19236 NY                         150 2011-02    
#>  7 19237 SC                         154 2011-02    
#>  8 19238 TO                         383 2011-02    
#>  9 19239 ET                         177 2011-03    
#> 10 19240 NY                         231 2011-03    
#> # ℹ 548 more rows