Relation Types and Restrictions

2018-01-30

This vignette explains how the relation_type and restrictions arguments work to ensure predictable mappings and check for correct relations between variables in your data. Examples also demonstrate how and when to use the arguments atomic, heterogenous_outputs, handle_duplicate_mappings, report_properties, and map_error_response.

Restrictions

It is sometimes important to ensure mappings have certain properties. For example, in a database, you will likely find it necessary that every possible user ID maps to no more than one username. relatable functions can be used to enforce this restriction with either relation_type (explained below), or a list of restrictions, which are used if relation_type = NULL. The following restrictions can be applied:

Thus for our ID to username mapping, we might want to create a function like this:

library(relatable)
valid_ids <- 10:99 # All possible ID numbers
usernames <- # List of usernames in order of entry
  c("Leonardo", "Michelangelo", "Raphael", "Donatello") 

get_user_from_id <- relation(
  A = valid_ids,
  B = usernames,
  default = "No username found",
  relation_type = NULL,
  restrictions = list(max_one_y_per_x = TRUE, max_one_x_per_y = TRUE),
  map_error_response = "throw"  # If restrictions are violated, return an error instead
)                               # of a warning

get_user_from_id(11)
#> [1] "Michelangelo"

Now, if the function is later updated to include a username that is already taken, an error will be thrown.

Relation types

Relation types translate to predetermined sets of restrictions, and are applied with a short string rather than a list. The following relation types are possible:

min_one_y_per_x min_one_x_per_y max_one_y_per_x max_one_x_per_y
one_to_one FALSE FALSE TRUE TRUE
many_to_many FALSE FALSE FALSE FALSE
one_to_many FALSE FALSE FALSE TRUE
many_to_one FALSE FALSE TRUE FALSE
func TRUE FALSE TRUE FALSE
injection TRUE FALSE TRUE TRUE
surjection TRUE TRUE TRUE FALSE
bijection TRUE TRUE TRUE TRUE

Examples

Enforce properties of relations between vectors

By default, the relation_type is assumed to be a funtcion (“func”), meaning that each input maps to one and only one output. These restrictions can be tightened or loosened depending on your needs. Many-to-many relations have loose restrictions allowing multiple outputs from a single input:

You may want exactly one unique output for each input (a bijection), but have duplicate mappings in your input vectors:

Determine relation properties for safer mappings

To illustrate some of the more advanced ways relatable can help ensure safer data manipulation, we will use the emperors data set of Ancient Roman Emperors assembled by github user Zoni Nation.

report_properties can give you useful information about the relation between two vectors:

emperors <- read.csv(
  "https://raw.githubusercontent.com/zonination/emperors/master/emperors.csv",
  stringsAsFactors = FALSE
)
colnames(emperors)
#>  [1] "index"       "name"        "name.full"   "birth"       "death"      
#>  [6] "birth.cty"   "birth.prv"   "rise"        "reign.start" "reign.end"  
#> [11] "cause"       "killer"      "dynasty"     "era"         "notes"      
#> [16] "verif.who"

## Suppose we want a function to map each emperor to the time of their reign.
## First, let's see that a unique mapping from either name or name.full is possible by
## using relation's report properties argument:
relation(emperors$name.full, emperors$reign.start,
  relation_type = NULL,
  atomic = FALSE,
  report_properties = TRUE)
#> Relation properties:
#> min_one_y_per_x: TRUE
#> min_one_x_per_y: TRUE
#> max_one_y_per_x: FALSE
#> max_one_x_per_y: FALSE

relation(emperors$name, emperors$reign.start,
  relation_type = NULL,
  atomic = FALSE,
  report_properties = TRUE)
#> Relation properties:
#> min_one_y_per_x: TRUE
#> min_one_x_per_y: TRUE
#> max_one_y_per_x: TRUE
#> max_one_x_per_y: FALSE

## Neither mapping fulfils the criterion of max_one_y_per_x, but this is not a problem: in
## the later years of the Roman Empire, some emperors were co-rulers whose reigns began at
## the same time.
relate(c("0305-05-01", "0337-05-22"), emperors$reign.start, emperors$name,
  named = TRUE, relation_type = NULL, atomic = FALSE)
#> $`0305-05-01`
#> [1] "Constantius I" "Galerius"      "Severus II"   
#> 
#> $`0337-05-22`
#> [1] "Constantine II" "Consantius II"  "Constans"

## However, we can infer from max_one_y_per_x = FALSE that some elements of name.full are
## non-unique. This is because both Vespasian and his eldest son and successor Titus took
## the same imperial title.
relate(c("Vespasian", "Titus"), emperors$name, emperors$name.full,
  named = TRUE)
#>                                   Vespasian 
#> "TITVS FLAVIVS CAESAR VESPASIANVS AVGVSTVS" 
#>                                       Titus 
#> "TITVS FLAVIVS CAESAR VESPASIANVS AVGVSTVS"

## Hence we can determine that name and not name.full is a better choice for our mapping
## function.
reign_start <- relation(emperors$name, emperors$reign.start)
reign_start("Constantine the Great")
#> [1] "0306-07-25"

## Repeating the vector A can let us return multiple variables at once to return an n-tuple
nice_date <- function(s) {
  d <- as.Date(s, "%Y-%m-%d")
  return(format.Date(d, "%d %B, %Y AD"))
}

reign_duration <- relation(
  rep(emperors$name, 2),
  nice_date(c(emperors$reign.start, emperors$reign.end)),
  relation_type = NULL,
  atomic = FALSE, named = TRUE
)
reign_duration(c("Vespasian", "Titus", "Domitian"))
#> $Vespasian
#> [1] "21 December, 69 AD" "24 June, 79 AD"    
#> 
#> $Titus
#> [1] "24 June, 79 AD"      "13 September, 81 AD"
#> 
#> $Domitian
#> [1] "14 September, 81 AD" "18 September, 96 AD"

## Or just for fun...
obituary <- with(
  emperors,
  relation(
    A = rep(name, 3),
    B = c(
      paste0("Born in ", birth.cty, ", ", birth.prv, " on ", nice_date(birth)),
      paste0("Came to power by ", rise, " on ", nice_date(reign.start)),
      paste0("Died from ", cause, " by ", killer, " on ", nice_date(death))
    ),
    relation_type = NULL,
    atomic = FALSE, named = TRUE
  )
)

obituary(
  c("Marcus Aurelius", "Commodus", "Pertinax", "Didius Julianus", "Septimus Severus", "Caracalla")
)
#> $`Marcus Aurelius`
#> [1] "Born in Rome, Italia on 26 April, 121 AD"               
#> [2] "Came to power by Birthright on 07 March, 161 AD"        
#> [3] "Died from Natural Causes by Disease on 17 March, 180 AD"
#> 
#> $Commodus
#> [1] "Born in Lanuvium, Italia on 31 August, 161 AD"                     
#> [2] "Came to power by Birthright on 01 January, 177 AD"                 
#> [3] "Died from Assassination by Praetorian Guard on 31 December, 192 AD"
#> 
#> $Pertinax
#> [1] "Born in Alba, Italia on 01 August, 126 AD"                             
#> [2] "Came to power by Appointment by Praetorian Guard on 01 January, 193 AD"
#> [3] "Died from Assassination by Praetorian Guard on 28 March, 193 AD"       
#> 
#> $`Didius Julianus`
#> [1] "Born in Milan, Italia on 30 January, 133 AD"     
#> [2] "Came to power by Purchase on 28 March, 193 AD"   
#> [3] "Died from Execution by Senate on 01 July, 193 AD"
#> 
#> $`Septimus Severus`
#> [1] "Born in Leptis Magna, Libya on 11 April, 145 AD"           
#> [2] "Came to power by Seized Power on 09 April, 193 AD"         
#> [3] "Died from Natural Causes by Disease on 04 February, 211 AD"
#> 
#> $Caracalla
#> [1] "Born in Lugdunum, Gallia Lugdunensis on 04 April, 188 AD"    
#> [2] "Came to power by Birthright on 01 January, 198 AD"           
#> [3] "Died from Assassination by Other Emperor on 08 April, 217 AD"