---
title: "Normalization Methods"
vignette: >
  %\VignetteIndexEntry{Normalization Methods}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
knitr:
  opts_chunk:
    crop: !expr 'rlang::is_installed(c("magick"))'
    collapse: true
    comment: '#>'
format:
  html:
    toc: true
    html-math-method: mathjax
bibliography: references.bib
---

```{r}
#| eval: !expr 'rlang::is_installed("magick")'
#| echo: false
knitr::knit_hooks$set(crop = knitr::hook_pdfcrop)
```

```{r}
#| label: setup
library(tidynorm)
library(dplyr)
```

In addition to the generic normalization functions in tidynorm (`norm_generic()`, `norm_track_generic()` and `norm_dct_generic()`), there are a number of convenience functions for a few established normalization methods.

## Lobanov [@lobanov]

tidynorm functions:

-   `norm_lobanov()`
-   `norm_track_lobanov()`
-   `norm_dct_lobanov()`

Lobanov normalization z-scores each formant.
If $F_{ij}$ is the $j^{th}$ token of the $i^{th}$ formant, and $\hat{F}_{ij}$ is its normalized value, then

$$
\hat{F}_{ij} = \frac{F_{ij} - L_i}{S_i}
$$

Where $L_i$ is the mean across the $i^{th}$ formant:

$$
L_i = \frac{1}{N}\sum_{j=1}^N F_{ij}
$$

And $S_i$ is the standard deviation across the $i^{th}$ formant.

$$
S_i = \sqrt{\frac{\sum_j(F_{ij}-L_i)^2}{N-1}}
$$

### Using the Lobanov normalization functions

#### On points

```{r}
point_norm <- speaker_data |>
  norm_lobanov(
    F1:F3,
    .by = speaker
  )
```

#### On tracks

```{r}
track_norm <- speaker_tracks |>
  norm_track_lobanov(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .time_col = t
  )
```

#### On DCT Coefficients

```{r}
dct_norm <- speaker_tracks |>
  reframe_with_dct(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .time_col = t
  ) |>
  norm_dct_lobanov(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .param_col = .param
  )
```

## Nearey Normalization [@neareyPhoneticFeatureSystems1978]

tidynorm functions:

-   `norm_nearey()`
-   `norm_track_nearey()`
-   `norm_dct_nearey()`

Nearey Normalization first log transforms formant values, then subtracts the grand mean across all formants.
If $F_{ij}$ is the $j^{th}$ token of the $i^{th}$ formant, and $\hat{F}_{ij}$ is its normalized value, then

$$
\hat{F}_{ij} = \log(F_{ij}) - L
$$

$$
L = \frac{1}{MN}\sum_{i = 1}^M\sum_{j=1}^N \log(F_{ij})
$$

The fact that the grand mean is taken across all formants, it's important to report whether just F1 and F2 were used, or if F1, F2 and F3 were used.

### Using the Nearey normalization functions

#### On points

```{r}
point_norm <- speaker_data |>
  norm_nearey(
    F1:F3,
    .by = speaker
  )
```

#### On tracks

```{r}
track_norm <- speaker_tracks |>
  norm_track_nearey(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .time_col = t
  )
```

#### On DCT Coefficients

```{r}
dct_norm <- speaker_tracks |>
  mutate(across(F1:F3, log)) |>
  reframe_with_dct(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .time_col = t
  ) |>
  norm_dct_nearey(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .param_col = .param
  )
```

## Delta F [@johnsonDFMethodVocal2020]

tidynorm functions:

-   `norm_deltaF()`

-   `norm_track_deltaF()`

-   `norm_dct_deltaF()`

The $\Delta F$ normalization method is based on the average of formant spacing.
If $F_{ij}$ is the $j^{th}$ token of the $i^{th}$ formant, and $\hat{F}_{ij}$ is its normalized value, then

$$
\hat{F} = \frac{F_{ij}}{S}
$$

$$
S = \frac{1}{MN} \sum_{i=1}^M\sum_{j=1}^N \frac{F_{ij}}{i-0.5}
$$

The fact that this method takes a weighted average across all formants, it's important to report whether just F1 and F2 were used, or if F1, F2 and F3 were used.

### Using the DeltaF normalization functions

#### On points

```{r}
point_norm <- speaker_data |>
  norm_deltaF(
    F1:F3,
    .by = speaker
  )
```

#### On tracks

```{r}
track_norm <- speaker_tracks |>
  norm_track_deltaF(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .time_col = t
  )
```

#### On DCT coefficients

```{r}
dct_norm <- speaker_tracks |>
  reframe_with_dct(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .time_col = t
  ) |>
  norm_dct_deltaF(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .param_col = .param
  )
```

## Watt & Fabricious [@wattEvaluationTechniqueImproving2002]

tidynorm functions:

-   `norm_wattfab()`

-   `norm_track_wattfab()`

-   `norm_dct_wattfab()`

The Watt & Fabricious method attempt to center vowel spaces on their "center of gravity".
The original Watt & Fabricious method involved calculating average F1 and F2 values for point vowels.
In tidynorm, a modified version has been implemented that just uses the average over F1 and F2 as the centers of gravity.
If $F_{ij}$ is the $j^{th}$ token of the $i^{th}$ formant, and $\hat{F}_{ij}$ is its normalized value, then

$$
\hat{F_{ij}} = \frac{F_{ij}}{S_i}
$$

Where $S_i$ is the mean across the $i_{th}$ formant.

$$
S_i = \frac{1}{N} \sum_{j = 1}^N F_{ij}
$$

### Using the Watt & Fabricious normaliation functions

#### On points

```{r}
point_norm <- speaker_data |>
  norm_wattfab(
    F1:F3,
    .by = speaker
  )
```

#### On tracks

```{r}
track_norm <- speaker_tracks |>
  norm_track_wattfab(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .time_col = t
  )
```

#### On DCT coefficients

```{r}
dct_norm <- speaker_tracks |>
  reframe_with_dct(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .time_col = t
  ) |>
  norm_dct_wattfab(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .param_col = .param
  )
```

## Bark Difference [@syrdalPerceptualModelVowel1986]

tidynorm functions

-   `norm_barkz()`

-   `norm_track_barkz()`

-   `norm_dct_barkz()`

The bark difference metric tries to normalize vowels on the basis of individual tokens.
First, formant data is converted to bark (see `hz_to_bark()`), then F3 is subtracted from F1 and F2.
If $F_{ij}$ is the $j^{th}$ token of the $i^{th}$ formant, and $\hat{F}_{ij}$ is its normalized value, then

$$
\hat{F}_{ij} = \text{bark}(F_{ij}) - L_j
$$

$$
L_j = \text{bark}(F_{3j})
$$

### Using the Bark Difference normalization functions

#### On points

```{r}
point_norm <- speaker_data |>
  norm_barkz(
    F1:F3,
    .by = speaker
  )
```

#### On tracks

```{r}
track_norm <- speaker_tracks |>
  norm_track_barkz(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .time_col = t
  )
```

#### On DCT Coefficients

```{r}
dct_norm <- speaker_tracks |>
  mutate(
    across(F1:F3, hz_to_bark)
  ) |>
  reframe_with_dct(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .time_col = t
  ) |>
  norm_dct_barkz(
    F1:F3,
    .by = speaker,
    .token_id_col = id,
    .param_col = .param
  )
```

## References