Type: | Package |
Title: | Bayesian Inference for A|B and Bandit Marketing Tests |
Version: | 1.3.5 |
Author: | Ryan Angi |
Maintainer: | Ryan Angi <angi.ryan@gmail.com> |
Description: | Uses simple Bayesian conjugate prior update rules to calculate the win probability of each option, value remaining in the test, and percent lift over the baseline for various marketing objectives. References: Fink, Daniel (1997) "A Compendium of Conjugate Priors" https://www.johndcook.com/CompendiumOfConjugatePriors.pdf. Stucchio, Chris (2015) "Bayesian A/B Testing at VWO" https://vwo.com/downloads/VWO_SmartStats_technical_whitepaper.pdf. |
Depends: | R (≥ 2.10) |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Imports: | purrr, dplyr, tidyr (≥ 1.0.0), magrittr, tibble, rlang |
Suggests: | spelling, knitr, testthat (≥ 2.1.0), rmarkdown |
Language: | en-US |
VignetteBuilder: | knitr |
URL: | https://github.com/rangi513/grizbayr |
BugReports: | https://github.com/rangi513/grizbayr/issues |
NeedsCompilation: | no |
Packaged: | 2023-10-09 16:41:06 UTC; rangi |
Repository: | CRAN |
Date/Publication: | 2023-10-09 17:00:02 UTC |
grizbayr: Bayesian Inference for A|B and Bandit Marketing Tests
Description
Uses simple Bayesian conjugate prior update rules to calculate the win probability of each option, value remaining in the test, and percent lift over the baseline for various marketing objectives. References: Fink, Daniel (1997) "A Compendium of Conjugate Priors" https://www.johndcook.com/CompendiumOfConjugatePriors.pdf. Stucchio, Chris (2015) "Bayesian A/B Testing at VWO" https://vwo.com/downloads/VWO_SmartStats_technical_whitepaper.pdf.
See Also
Useful links:
Calculate Multi Rev Per Session
Description
Calculate Multi Rev Per Session
Usage
calculate_multi_rev_per_session(conv_rates, inverse_rev_A, inverse_rev_B)
Arguments
conv_rates |
Dirichlet samples containing a tibble with columns alpha_1, alpha_2, and alpha_0 |
inverse_rev_A |
Vector of inverse revenue samples from A conversion type |
inverse_rev_B |
Vector of inverse revenue samples from B conversion type |
Value
Vector of samples (dbl)
Calculate Total CM
Description
Calculate Total CM
Usage
calculate_total_cm(rev_per_click, cost_per_click, expected_clicks)
Arguments
rev_per_click |
vector of rev per click samples |
cost_per_click |
vector of cost per click (cpc) samples |
expected_clicks |
vector of expected clicks (expected CTR * fixed impressions) |
Value
vector of CM estimates (dbl)
Estimate All Values
Description
Efficiently estimates all values at once so the posterior only need to be sampled one time. This function will return as a list win probability, value remaining, estimated percent lift with respect to the provided option, and the win probability of the best option vs the provided option.
Usage
estimate_all_values(
input_df,
distribution,
wrt_option_lift,
priors = list(),
wrt_option_vr = NULL,
loss_threshold = 0.95,
lift_threshold = 0.7,
metric = "lift"
)
Arguments
input_df |
Dataframe containing option_name (str) and various other columns depending on the distribution type. See vignette for more details. |
distribution |
String of the distribution name |
wrt_option_lift |
String: the option lift and win probability is calculated with respect to (wrt). Required. |
priors |
Optional list of priors. Defaults will be use otherwise. |
wrt_option_vr |
String: the option against which loss (value remaining) is calculated. If NULL the best option will be used. (optional) |
loss_threshold |
The confidence interval specifying what the "worst case scenario" should be. Defaults to 95%. (optional) |
lift_threshold |
The confidence interval specifying how likely the lift is to be true. Defaults to 70%. (optional) |
metric |
string the type of loss. absolute will be the difference, on the outcome scale. 0 when best = wrt_option lift will be the (best - wrt_option) / wrt_option, 0 when best = wrt_option relative_risk will be the ratio best/wrt_option, 1 when best = wrt_option |
Details
TODO: Add high density credible intervals to this output for each option.
Value
A list with 4 named items: Win Probability, Value Remaining, Lift vs Baseline, and Win Probability vs Baseline.
Examples
## Not run:
input_df <- data.frame(option_name = c("A", "B", "C"),
sum_clicks = c(1000, 1000, 1000),
sum_conversions = c(100, 120, 110), stringsAsFactors = FALSE)
estimate_all_values(input_df, distribution = "conversion_rate", wrt_option_lift = "A")
## End(Not run)
Estimate Lift Distribution
Description
Estimates lift distribution vector from posterior samples.
Usage
estimate_lift(posterior_samples, distribution, wrt_option, metric = "lift")
Arguments
posterior_samples |
Tibble returned from sample_from_posterior with 3 columns 'option_name', 'samples', and 'sample_id'. |
distribution |
String of the distribution name |
wrt_option |
string the option lift is calculated with respect to (wrt). Required. |
metric |
string the type of lift. 'absolute“ will be the difference, on the outcome scale. 0 when best = wrt_option 'lift“ will be the (best - wrt_option) / wrt_option, 0 when best = wrt_option 'relative_risk“ will be the ratio best/wrt_option, 1 when best = wrt_option |
Value
numeric, the lift distribution
Examples
# Requires posterior_samples dataframe. See `sample_from_posterior()`
# for an example.
## Not run:
estimate_lift(posterior_samples = posterior_samples,
distribution = "conversion_rate",
wrt_option = "A",
metric = "lift")
## End(Not run)
Estimate Lift vs Baseline
Description
Estimate Lift vs Baseline
Usage
estimate_lift_vs_baseline(
input_df,
distribution,
priors = list(),
wrt_option,
metric = "lift",
threshold = 0.7
)
Arguments
input_df |
Dataframe containing option_name (str) and various other columns depending on the distribution type. See vignette for more details. |
distribution |
String of the distribution name |
priors |
Optional list of priors. Defaults will be use otherwise. |
wrt_option |
string the option loss is calculated with respect to (wrt). Required. |
metric |
string the type of loss. absolute will be the difference, on the outcome scale. 0 when best = wrt_option lift will be the (best - wrt_option) / wrt_option, 0 when best = wrt_option relative_risk will be the ratio best/wrt_option, 1 when best = wrt_option |
threshold |
Lift percentage threshold between 0 and 1. (0.7 threshold is "at least 70% lift"). Defaults to 0.7. |
Value
numeric value remaining at the specified threshold
Examples
input_df <- tibble::tibble(option_name = c("A", "B", "C"),
sum_clicks = c(1000, 1000, 1000),
sum_conversions = c(100, 120, 110))
estimate_lift_vs_baseline(input_df, distribution = "conversion_rate", wrt_option = "A")
Estimate Loss
Description
Estimate Loss
Usage
estimate_loss(
posterior_samples,
distribution,
wrt_option = NULL,
metric = c("absolute", "lift", "relative_risk")
)
Arguments
posterior_samples |
Tibble: returned from sample_from_posterior with 3 columns 'option_name', 'samples', and 'sample_id'. |
distribution |
String: the name of the distribution |
wrt_option |
String: the option loss is calculated with respect to (wrt). If NULL, the best option will be chosen. |
metric |
String: the type of loss. absolute will be the difference, on the outcome scale. 0 when best = wrt_option lift will be the (best - wrt_option) / wrt_option, 0 when best = wrt_option relative_risk will be the ratio best/wrt_option, 1 when best = wrt_option |
Value
numeric, the loss distribution
Examples
# Requires posterior_samples dataframe. See `sample_from_posterior()`
# for an example.
## Not run:
estimate_loss(posterior_samples = posterior_samples, distribution = "conversion_rate")
## End(Not run)
Estimate Value Remaining
Description
Estimates value remaining or loss (in terms of percent lift, absolute, or relative).
Usage
estimate_value_remaining(
input_df,
distribution,
priors = list(),
wrt_option = NULL,
metric = "lift",
threshold = 0.95
)
Arguments
input_df |
Dataframe containing option_name (str) and various other columns depending on the distribution type. See vignette for more details. |
distribution |
String of the distribution name |
priors |
Optional list of priors. Defaults will be use otherwise. |
wrt_option |
string the option loss is calculated with respect to (wrt). If NULL, the best option will be chosen. |
metric |
string the type of loss. absolute will be the difference, on the outcome scale. 0 when best = wrt_option lift will be the (best - wrt_option) / wrt_option, 0 when best = wrt_option relative_risk will be the ratio best/wrt_option, 1 when best = wrt_option |
threshold |
The confidence interval specifying what the "worst case scenario should be. Defaults to 95%. (optional) |
Value
numeric value remaining at the specified threshold
Examples
input_df <- tibble::tibble(option_name = c("A", "B", "C"),
sum_clicks = c(1000, 1000, 1000),
sum_conversions = c(100, 120, 110))
estimate_value_remaining(input_df, distribution = "conversion_rate")
estimate_value_remaining(input_df,
distribution = "conversion_rate",
threshold = 0.99)
estimate_value_remaining(input_df,
distribution = "conversion_rate",
wrt_option = "A",
metric = "absolute")
Estimate Win Probability
Description
Creates a tibble of win probabilities for each option based on the data observed.
Usage
estimate_win_prob(input_df, distribution, priors = list())
Arguments
input_df |
Dataframe containing option_name (str) and various other columns depending on the distribution type. See vignette for more details. |
distribution |
String of the distribution name |
priors |
Optional list of priors. Defaults will be use otherwise. |
Value
tibble object with 2 columns: 'option_name' and 'win_probability' formatted as a percent
Examples
input_df <- tibble::tibble(
option_name = c("A", "B"),
sum_clicks = c(1000, 1000),
sum_conversions = c(100, 120)
)
estimate_win_prob(input_df, "conversion_rate")
Estimate Win Probability Given Posterior Distribution
Description
Estimate Win Probability Given Posterior Distribution
Usage
estimate_win_prob_given_posterior(posterior_samples, winner_is_max = TRUE)
Arguments
posterior_samples |
Tibble of data in long form with 2 columns 'option_name' and 'samples' |
winner_is_max |
Boolean. This should almost always be TRUE. If a larger number is better then this should be TRUE. This should be FALSE for metrics such as CPA or CPC where a higher cost is not necessarily better. |
Value
Tibble of each option_name and the win probability expressed as a percentage and a decimal 'raw'
Examples
# Requires posterior_samples dataframe. See `sample_from_posterior()`
# for an example.
## Not run:
estimate_win_prob_given_posterior(posterior_samples = posterior_samples)
estimate_win_prob_given_posterior(
posterior_samples = posterior_samples,
winner_is_max = TRUE
)
## End(Not run)
Estimate Win Probability vs. Baseline
Description
Calculates the win probability of the best option compared to a single other option given an input_df
Usage
estimate_win_prob_vs_baseline(
input_df,
distribution,
priors = list(),
wrt_option
)
Arguments
input_df |
Dataframe containing option_name (str) and various other columns depending on the distribution type. See vignette for more details. |
distribution |
String of the distribution name |
priors |
Optional list of priors. Defaults will be use otherwise. |
wrt_option |
string the option win prob is calculated with respect to (wrt). Required. |
Value
Tibble of each option_name and the win probability expressed as a percentage and a decimal 'raw'
Examples
input_df <- tibble::tibble(
option_name = c("A", "B", "C"),
sum_clicks = c(1000, 1000, 1000),
sum_conversions = c(100, 120, 110)
)
estimate_win_prob_vs_baseline(input_df = input_df,
distribution = "conversion_rate",
wrt_option = "B")
Estimate Win Probability vs. Baseline Given Posterior
Description
Calculates the win probability of the best option compared to a single other option given a posterior distribution.
Usage
estimate_win_prob_vs_baseline_given_posterior(
posterior_samples,
distribution,
wrt_option
)
Arguments
posterior_samples |
Tibble returned from sample_from_posterior with 3 columns 'option_name', 'samples', and 'sample_id'. |
distribution |
String: the distribution name |
wrt_option |
String: the option to compare against the best option. |
Value
Tibble of each option_name and the win probability expressed as a percentage and a decimal 'raw'
Examples
# Requires posterior_samples dataframe. See `sample_from_posterior()`
# for an example.
## Not run:
estimate_win_prob_vs_baseline_given_posterior(
posterior_samples = posterior_samples,
distribution = "conversion_rate",
wrt_option = "A")
## End(Not run)
Find Best Option
Description
Samples from posterior, calculates win probability, and selects the best option. Note: this can be inefficient if you already have the win probability dataframe. Only use this if that has not already been calculated.
Usage
find_best_option(posterior_samples, distribution)
Arguments
posterior_samples |
Tibble returned from sample_from_posterior with 3 columns 'option_name', 'samples', and 'sample_id'. |
distribution |
String: name of the distribution |
Value
String: the best option name
Examples
# Requires posterior distribution
## Not run:
find_best_option(posterior_samples = posterior_samples, distribution = "conversion_rate")
## End(Not run)
Impute Missing Options
Description
When win probability is calculated
Usage
impute_missing_options(posterior_samples, wp_raw)
Arguments
posterior_samples |
Tibble of data in long form with 2 columns 'option_name' and 'samples' |
wp_raw |
Tibble of win probabilities with the columns: 'option_name' and 'win_prob_raw' |
Value
wp_raw table with new rows if option names were missing.
Is Prior Valid
Description
Checks if a single valid prior name is in the list of prior values and if that prior value from the list is greater than 0.
Usage
is_prior_valid(priors_list, valid_prior)
Arguments
priors_list |
A list of valid priors |
valid_prior |
A character string |
Value
Boolean (TRUE/FALSE)
Is Winner Max
Description
Determines if the max or min function should be used for win probability. If CPA or CPC distribution, lower is better, else higher number is better.
Usage
is_winner_max(distribution)
Arguments
distribution |
String: the name of the distribution |
Value
Boolean TRUE/FALSE
Random Dirichlet
Description
Randomly samples a vector of length n from a dirichlet distribution parameterized by a vector of alphas PDF of Gamma with scale = 1 : f(x)= 1/(Gamma(a)) x^(a-1) e^-(x)
Usage
rdirichlet(n, alphas_list)
Arguments
n |
integer, the number of samples |
alphas_list |
Named List of Integers: parameters of the dirichlet, interpreted as the number of success of each outcome |
Value
n x length(alphas) named tibble representing the probability of observing each outcome
Examples
rdirichlet(100, list(a = 20, b = 15, c = 60))
Sample CM Per Click
Description
Adds 4 new nested columns to the input_df: 'beta_params', 'gamma_params_rev', 'gamma_params_cost'and 'samples'
Usage
sample_cm_per_click(input_df, priors, n_samples = 50000)
Arguments
input_df |
Dataframe containing option_name (str), sum_conversions (dbl), sum_revenue (dbl), and sum_clicks (dbl). |
priors |
Optional list of priors alpha0, beta0 for Beta, k0, theta0 for Gamma Inverse Revenue,
and k01, theta01 for Gamma Cost (uses alternate priors so they can be different from Revenue).
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
Details
'beta_params' and 'gamma_params_rev' in each row should be a
tibble of length 2 (\alpha
and \beta
parameters
and k
and \theta
parameters)
'samples' in each row should be a tibble of length 'n_samples'
See update_rules vignette for a mathematical representation.
CMPerClick = ConversionsPerClick * RevPerConversion - CostPerClick
Value
input_df with 4 new nested columns 'beta_params', 'gamma_params_rev', 'gamma_params_cost', and 'samples'
Sample Conversion Rate
Description
Adds 2 new nested columns to the input_df: 'beta_params' and 'samples'
'beta_params' in each row should be a tibble of length 2 (\alpha
and \beta
parameters)
'samples' in each row should be a tibble of length 'n_samples'
Usage
sample_conv_rate(input_df, priors, n_samples = 50000)
Arguments
input_df |
Dataframe containing option_name (str), sum_conversions (dbl), and sum_clicks (dbl). |
priors |
Optional list of priors alpha0 and beta0.
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
Details
See update_rules vignette for a mathematical representation.
conversion_i ~ Bernoulli(\phi)
\phi ~ Beta(\alpha, \beta)
Conversion Rate is sampled from a Beta distribution with a Binomial likelihood of an individual converting.
Value
input_df with 2 new nested columns 'beta_params' and 'samples'
Sample Cost Per Activation (CPA)
Description
Adds 3 new nested columns to the input_df: 'beta_params', 'gamma_params', and 'samples'
'beta_params' and 'gamma_params' in each row should be a tibble of length 2 (\alpha
and \beta
parameters and k
and \theta
parameters)
'samples' in each row should be a tibble of length 'n_samples'
Usage
sample_cpa(input_df, priors, n_samples = 50000)
Arguments
input_df |
Dataframe containing option_name (str), sum_conversions (dbl), sum_cost (dbl), and sum_clicks (dbl). |
priors |
Optional list of priors alpha0, beta0 for Beta and k0, theta0
for Gamma.
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
Details
See update_rules vignette for a mathematical representation. This is a combination of a Beta-Bernoulli update and a Gamma-Exponential update.
conversion_i ~ Bernoulli(\phi)
cpc_i ~ Exponential(\lambda)
\phi ~ Beta(\alpha, \beta)
\lambda ~ Gamma(k, \theta)
cpa_i ~ 1/ (Bernoulli(\phi) * Exponential(\lambda))
averageCPA ~ 1/(\phi\lambda)
Conversion Rate is sampled from a Beta distribution with a Binomial likelihood of an individual converting.
Average CPC is sampled from a Gamma distribution with an Exponential likelihood of an individual cost.
Value
input_df with 3 new nested columns 'beta_params', 'gamma_params', and 'samples'
Sample Cost Per Click
Description
Adds 2 new nested columns to the input_df: 'gamma_params' and 'samples'
'gamma_params' in each row should be a tibble of length 2 (k
and \theta
parameters)
'samples' in each row should be a tibble of length 'n_samples'
Usage
sample_cpc(input_df, priors, n_samples = 50000)
Arguments
input_df |
Dataframe containing option_name (str), sum_clicks (dbl), sum_cost (dbl). |
priors |
Optional list of priors k0, theta0 for Gamma.
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
Details
See update_rules vignette for a mathematical representation.
cpc_i ~ Exponential(\lambda)
\lambda ~ Gamma(k, \theta)
Average CPC is sampled from a Gamma distribution with an Exponential likelihood of an individual cost.
Value
input_df with 2 new nested columns 'gamma_params' and 'samples'
Sample Click Through Rate
Description
This is an alias for sample_conv_rate with 2 different input
columns. This function calculates posterior samples of
CTR = clicks/impressions
. Adds 2 new nested columns to
the input_df: 'beta_params' and 'samples'.
'beta_params' in each row should be a tibble of length 2 (\alpha
and \beta
parameters)
'samples' in each row should be a tibble of length 'n_samples'
Usage
sample_ctr(input_df, priors, n_samples = 50000)
Arguments
input_df |
Dataframe containing option_name (str), sum_clicks (dbl), and sum_impressions (dbl). |
priors |
Optional list of priors alpha0 and beta0.
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
Details
See update_rules vignette for a mathematical representation.
click_i ~ Bernoulli(\phi)
\phi ~ Beta(\alpha, \beta)
Click Through Rate is sampled from a Beta distribution with a Binomial likelihood of an individual Clicking
Value
input_df with 2 new nested columns 'beta_params' and 'samples'
Sample From Posterior
Description
Selects which function to use to sample from the posterior distribution
Usage
sample_from_posterior(
input_df,
distribution,
priors = list(),
n_samples = 50000
)
Arguments
input_df |
Dataframe containing option_name (str) and various other columns depending on the distribution type. See vignette for more details. |
distribution |
String of the distribution name |
priors |
Optional list of priors. Defaults will be use otherwise. |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
Value
A tibble with 2 columns: option_name (chr) and samples (dbl) [long form data].
Examples
input_df <- tibble::tibble(
option_name = c("A", "B"),
sum_clicks = c(1000, 1000),
sum_conversions = c(100, 120),
sum_sessions = c(1000, 1000),
sum_revenue = c(1000, 1500)
)
sample_from_posterior(input_df, "conversion_rate")
sample_from_posterior(input_df, "rev_per_session")
Sample Multiple Revenue Per Session
Description
Adds 5 new nested columns to the input_df: 'dirichlet_params', 'gamma_params_A', 'gamma_params_B', and 'samples'. This samples from multiple revenue per session distributions at once.
Usage
sample_multi_rev_per_session(input_df, priors, n_samples = 50000)
Arguments
input_df |
Dataframe containing option_name (str), sum_conversions (dbl), sum_sessions (dbl), sum_revenue (dbl), sum_conversion_2 (dbl), sum_sessions_2 (dbl), sum_revenue_2 (dbl). |
priors |
Optional list of priors alpha0 and beta0.
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
Details
See update_rules vignette for a mathematical representation.
conversion_i ~ MultiNomial(\phi_1, \phi_2, ..., \phi_k)
\phi_k ~ Dirichlet(\alpha, \beta)
Conversion Rate is sampled from a Dirichlet distribution with a Multinomial likelihood of an individual converting.
Value
input_df with 4 new nested columns 'dirichlet_params', 'gamma_params_A', 'gamma_params_B', and 'samples'. 'samples' in each row should be a tibble of length 'n_samples'.
Sample Page Views Per Session (Visit)
Description
Adds 2 new nested columns to the input_df: 'gamma_params' and 'samples'
'gamma_params' in each row should be a tibble of length 2 (k
and \theta
parameters)
'samples' in each row should be a tibble of length 'n_samples'
Usage
sample_page_views_per_session(input_df, priors, n_samples = 50000)
Arguments
input_df |
Dataframe containing option_name (str), sum_sessions (dbl), and sum_page_views_per_session (dbl). |
priors |
Optional list of priors k0 and theta0.
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
Details
See update_rules vignette for a mathematical representation.
page_views_i ~ Poisson(\lambda)
\lambda ~ Gamma(k, \theta)
Page Views Per Visit is sampled from a Gamma distribution with a Poisson likelihood of an individual having n page views by the end of their session.
This is not always the case, so verify your data follows the shape of an Poisson distribution before using this.
Value
input_df with 2 new nested columns 'gamma_params' and 'samples'
Sample Response Rate
Description
This is an alias for sample_conv_rate with a different input column.
Adds 2 new nested columns to the input_df: 'beta_params' and 'samples'
'beta_params' in each row should be a tibble of length 2 (\alpha
and \beta
parameters)
'samples' in each row should be a tibble of length 'n_samples'
Usage
sample_response_rate(input_df, priors, n_samples = 50000)
Arguments
input_df |
Dataframe containing option_name (str), sum_conversions (dbl), and sum_sessions (dbl). |
priors |
Optional list of priors alpha0 and beta0.
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
Details
See update_rules vignette for a mathematical representation.
conversion_i ~ Bernoulli(\phi)
\phi ~ Beta(\alpha, \beta)
Response Rate is sampled from a Beta distribution with a Binomial likelihood of an individual converting.
Value
input_df with 2 new nested columns 'beta_params' and 'samples'
Sample Rev Per Session
Description
Adds 3 new nested columns to the input_df: 'beta_params', 'gamma_params', and 'samples'
'beta_params' and 'gamma_params' in each row should be a tibble of length 2 (\alpha
and \beta
parameters and k
and \theta
parameters)
'samples' in each row should be a tibble of length 'n_samples'
Usage
sample_rev_per_session(input_df, priors, n_samples = 50000)
Arguments
input_df |
Dataframe containing option_name (str), sum_conversions (dbl), sum_revenue (dbl), and sum_clicks (dbl). |
priors |
Optional list of priors alpha0, beta0 for Beta
and k0, theta0 for Gamma. Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
Details
See update_rules vignette for a mathematical representation.
RevPerSession = RevPerOrder * OrdersPerClick
This is a combination of a Beta-Bernoulli update and a Gamma-Exponential update.
conversion_i ~ Bernoulli(\phi)
revenue_i ~ Exponential(\lambda)
\phi ~ Beta(\alpha, \beta)
\lambda ~ Gamma(k, \theta)
revenue_i ~ Bernoulli(\phi) * Exponential(\lambda)^-1)
Rev Per Session ~ \phi / \lambda
Conversion Rate is sampled from a Beta distribution with a Binomial likelihood of an individual converting.
Average Rev Per Order is sampled from a Gamma distribution with an Exponential likelihood of Revenue from an individual order. This function makes sense to use if there is a distribution of possible revenue values that can be produced from a single order or conversion.
Value
input_df with 3 new nested columns 'beta_params', 'gamma_params', and 'samples'
Sample Session Duration
Description
Adds 2 new nested columns to the input_df: 'gamma_params' and 'samples'
'gamma_params' in each row should be a tibble of length 2 (k
and \theta
parameters)
'samples' in each row should be a tibble of length 'n_samples'
Usage
sample_session_duration(input_df, priors, n_samples = 50000)
Arguments
input_df |
Dataframe containing option_name (str), sum_sessions (dbl), and sum_duration (dbl). |
priors |
Optional list of priors k0 and theta0.
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
Details
See update_rules vignette for a mathematical representation.
duration_i ~ Exponential(\lambda)
\lambda ~ Gamma(k, \theta)
Session Duration is sampled from a Gamma distribution with a Exponential likelihood of an individual leaving the site or ending a session at time t.
This is not always the case, so verify your data follows the shape of an exponential distribution before using this.
Value
input_df with 2 new nested columns 'gamma_params' and 'samples'
Sample Total CM (Given Impression Count)
Description
Adds 4 new nested columns to the input_df: 'beta_params_ctr', 'beta_params_conv','gamma_params_rev', 'gamma_params_cost' and 'samples'.
Usage
sample_total_cm(input_df, priors, n_samples = 50000)
Arguments
input_df |
Dataframe containing option_name (str), sum_conversions (dbl), sum_revenue (dbl), and sum_clicks (dbl). |
priors |
Optional list of priors alpha0, beta0 for Beta,
k0, theta0 for Gamma Inverse Revenue, and k01, theta01 for
Gamma Cost (uses alternate priors so they can be different from Revenue).
Default |
n_samples |
Optional integer value. Defaults to 50,000 samples. |
Details
'beta_params' and 'gamma_params' in each row should be a tibble of length 2
(\alpha
and \beta
params and k
and \theta
params).
'samples' in each row should be a tibble of length 'n_samples'.
One assumption in this model is that sum_impressions is not stochastic. This assumes that Clicks are stochastically generated from a set number of Impressions. It does not require that the number of impressions are equal on either side. Generally this assumption holds true in marketing tests where traffic is split 50/50 and very little variance is observed in the number of impressions on either side.
See update_rules vignette for a mathematical representation.
TotalCM = Impr * ExpectedCTR * (RevPerOrder * OrdersPerClick - ExpectedCPC)
Value
input_df with 5 new nested columns 'beta_params_conv', 'beta_params_ctr', 'gamma_params_rev','gamma_params_cost', and 'samples'
Update Beta
Description
Updates Beta Distribution with the Beta-Bernoulli conjugate prior update rule
Usage
update_beta(alpha, beta, priors = list())
Arguments
alpha |
Double value for alpha (count of successes). Must be 0 or greater. |
beta |
Double value for beta (count of failures). Must be 0 or greater. |
priors |
An optional list object that contains alpha0 and beta0. Otherwise the function with use Beta(1,1) as the prior distribution. |
Value
A tibble object that contains 'alpha' and 'beta'
Examples
update_beta(alpha = 1, beta = 5, priors = list(alpha0 = 2, beta0 = 2))
update_beta(alpha = 20000, beta = 50000)
Update Dirichlet Distribution
Description
This function updates the Dirichlet distribution with the Dirichlet-Multinomial conjugate prior update rule.
Usage
update_dirichlet(alpha_0, alpha_1, alpha_2, priors = list())
Arguments
alpha_0 |
Double value for alpha_0 (count of failures). Must be 0 or greater. |
alpha_1 |
Double value for alpha_1 (count of successes side 1). Must be 0 or greater. |
alpha_2 |
Double value for alpha_2 (count of successes side 2). Must be 0 or greater. |
priors |
An optional list object that contains alpha00, alpha01, and alpha02.
Otherwise the function with use |
Details
TODO: This function currently only works in 3 dimensions. Should be extended into N dimensions in the future. Can use ... notation.
Value
tibble with columns alpha_0, alpha_1, and alpha_2
Examples
update_dirichlet(alpha_0 = 20, alpha_1 = 5, alpha_2 = 2)
sample_priors_list <- list(alpha00 = 2, alpha01 = 3, alpha02 = 5)
update_dirichlet(alpha_0 = 20, alpha_1 = 5, alpha_2 = 2, priors = sample_priors_list)
Update Gamma
Description
Updates Gamma Distribution with the Gamma-Exponential
conjugate prior update rule. Parameterized by k
and \theta
(not \alpha, \beta
)
Usage
update_gamma(k, theta, priors = list(), alternate_priors = FALSE)
Arguments
k |
Double value for |
theta |
Double value for |
priors |
An optional list object that contains k0 and
theta0. Otherwise the function will use |
alternate_priors |
Boolean Defaults to FALSE. Allows a user to specify alternate prior names so the same prior isn't required when multiple gamma distributions are used. |
Value
A list object that contains 'k' and 'theta'
Examples
update_gamma(k = 1, theta = 100, priors = list(k0 = 2, theta0 = 1000))
update_gamma(k = 10, theta = 200)
Validate Data Values
Description
Validates data values are all greater than 0.
Usage
validate_data_values(data_values)
Arguments
data_values |
List of named data values |
Value
None
Validate Input Column
Description
Validates the input column exists in the dataframe, is of the correct type, and that all values are greater than or equal to 0.
Usage
validate_input_column(column_name, input_df, greater_than_zero = TRUE)
Arguments
column_name |
String value of the column name |
input_df |
Dataframe containing option_name (str) and various other columns depending on the distribution type. See vignette for more details. |
greater_than_zero |
Boolean: Do all values in the column have to be greater than zero? |
Value
None
Validate Input DataFrame
Description
Validates the input dataframe has the correct type, correct required column names, that the distribution is valid, that the column types are correct, and that the column values are greater than or equal to 0 when they are numeric.
Usage
validate_input_df(input_df, distribution)
Arguments
input_df |
Dataframe containing option_name (str) and various other columns depending on the distribution type. See vignette for more details. |
distribution |
String of the distribution name |
Value
Bool TRUE if all checks pass.
Examples
input_df <- tibble::tibble(
option_name = c("A", "B"),
sum_clicks = c(1000, 1000),
sum_conversions = c(100, 120)
)
validate_input_df(input_df, "conversion_rate")
Validate Posterior Samples Dataframe
Description
Function fails if posterior is not shaped correctly.
Usage
validate_posterior_samples(posterior_samples)
Arguments
posterior_samples |
Tibble of data in long form with 2 columns 'option_name' and 'samples' |
Value
None
Validate Priors
Description
Validates list of priors against a vector of valid priors and if the values are not valid, default priors are returned.
Usage
validate_priors(priors, valid_priors, default_priors)
Arguments
priors |
List of named priors with double values. |
valid_priors |
A character vector of valid prior names. |
default_priors |
A list of default priors for the distribution. |
Value
A named list of valid priors for the distribution.
Validate With Respect To Option
Description
Verify that the option provided is in the poster_samples dataframe 'option_name' column. Raises error if not TRUE
Usage
validate_wrt_option(wrt_option, posterior_samples)
Arguments
wrt_option |
string name of the option |
posterior_samples |
Tibble returned from sample_from_posterior with 3 columns 'option_name', 'samples', and 'sample_id'. |
Value
None