--- title: "Normalization Methods" vignette: > %\VignetteIndexEntry{Normalization Methods} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} knitr: opts_chunk: crop: !expr 'rlang::is_installed(c("magick"))' collapse: true comment: '#>' format: html: toc: true html-math-method: mathjax bibliography: references.bib --- ```{r} #| eval: !expr 'rlang::is_installed("magick")' #| echo: false knitr::knit_hooks$set(crop = knitr::hook_pdfcrop) ``` ```{r} #| label: setup library(tidynorm) library(dplyr) ``` In addition to the generic normalization functions in tidynorm (`norm_generic()`, `norm_track_generic()` and `norm_dct_generic()`), there are a number of convenience functions for a few established normalization methods. ## Lobanov [@lobanov] tidynorm functions: - `norm_lobanov()` - `norm_track_lobanov()` - `norm_dct_lobanov()` Lobanov normalization z-scores each formant. If $F_{ij}$ is the $j^{th}$ token of the $i^{th}$ formant, and $\hat{F}_{ij}$ is its normalized value, then $$ \hat{F}_{ij} = \frac{F_{ij} - L_i}{S_i} $$ Where $L_i$ is the mean across the $i^{th}$ formant: $$ L_i = \frac{1}{N}\sum_{j=1}^N F_{ij} $$ And $S_i$ is the standard deviation across the $i^{th}$ formant. $$ S_i = \sqrt{\frac{\sum_j(F_{ij}-L_i)^2}{N-1}} $$ ### Using the Lobanov normalization functions #### On points ```{r} point_norm <- speaker_data |> norm_lobanov( F1:F3, .by = speaker ) ``` #### On tracks ```{r} track_norm <- speaker_tracks |> norm_track_lobanov( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) ``` #### On DCT Coefficients ```{r} dct_norm <- speaker_tracks |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) |> norm_dct_lobanov( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param ) ``` ## Nearey Normalization [@neareyPhoneticFeatureSystems1978] tidynorm functions: - `norm_nearey()` - `norm_track_nearey()` - `norm_dct_nearey()` Nearey Normalization first log transforms formant values, then subtracts the grand mean across all formants. If $F_{ij}$ is the $j^{th}$ token of the $i^{th}$ formant, and $\hat{F}_{ij}$ is its normalized value, then $$ \hat{F}_{ij} = \log(F_{ij}) - L $$ $$ L = \frac{1}{MN}\sum_{i = 1}^M\sum_{j=1}^N \log(F_{ij}) $$ The fact that the grand mean is taken across all formants, it's important to report whether just F1 and F2 were used, or if F1, F2 and F3 were used. ### Using the Nearey normalization functions #### On points ```{r} point_norm <- speaker_data |> norm_nearey( F1:F3, .by = speaker ) ``` #### On tracks ```{r} track_norm <- speaker_tracks |> norm_track_nearey( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) ``` #### On DCT Coefficients ```{r} dct_norm <- speaker_tracks |> mutate(across(F1:F3, log)) |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) |> norm_dct_nearey( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param ) ``` ## Delta F [@johnsonDFMethodVocal2020] tidynorm functions: - `norm_deltaF()` - `norm_track_deltaF()` - `norm_dct_deltaF()` The $\Delta F$ normalization method is based on the average of formant spacing. If $F_{ij}$ is the $j^{th}$ token of the $i^{th}$ formant, and $\hat{F}_{ij}$ is its normalized value, then $$ \hat{F} = \frac{F_{ij}}{S} $$ $$ S = \frac{1}{MN} \sum_{i=1}^M\sum_{j=1}^N \frac{F_{ij}}{i-0.5} $$ The fact that this method takes a weighted average across all formants, it's important to report whether just F1 and F2 were used, or if F1, F2 and F3 were used. ### Using the DeltaF normalization functions #### On points ```{r} point_norm <- speaker_data |> norm_deltaF( F1:F3, .by = speaker ) ``` #### On tracks ```{r} track_norm <- speaker_tracks |> norm_track_deltaF( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) ``` #### On DCT coefficients ```{r} dct_norm <- speaker_tracks |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) |> norm_dct_deltaF( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param ) ``` ## Watt & Fabricious [@wattEvaluationTechniqueImproving2002] tidynorm functions: - `norm_wattfab()` - `norm_track_wattfab()` - `norm_dct_wattfab()` The Watt & Fabricious method attempt to center vowel spaces on their "center of gravity". The original Watt & Fabricious method involved calculating average F1 and F2 values for point vowels. In tidynorm, a modified version has been implemented that just uses the average over F1 and F2 as the centers of gravity. If $F_{ij}$ is the $j^{th}$ token of the $i^{th}$ formant, and $\hat{F}_{ij}$ is its normalized value, then $$ \hat{F_{ij}} = \frac{F_{ij}}{S_i} $$ Where $S_i$ is the mean across the $i_{th}$ formant. $$ S_i = \frac{1}{N} \sum_{j = 1}^N F_{ij} $$ ### Using the Watt & Fabricious normaliation functions #### On points ```{r} point_norm <- speaker_data |> norm_wattfab( F1:F3, .by = speaker ) ``` #### On tracks ```{r} track_norm <- speaker_tracks |> norm_track_wattfab( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) ``` #### On DCT coefficients ```{r} dct_norm <- speaker_tracks |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) |> norm_dct_wattfab( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param ) ``` ## Bark Difference [@syrdalPerceptualModelVowel1986] tidynorm functions - `norm_barkz()` - `norm_track_barkz()` - `norm_dct_barkz()` The bark difference metric tries to normalize vowels on the basis of individual tokens. First, formant data is converted to bark (see `hz_to_bark()`), then F3 is subtracted from F1 and F2. If $F_{ij}$ is the $j^{th}$ token of the $i^{th}$ formant, and $\hat{F}_{ij}$ is its normalized value, then $$ \hat{F}_{ij} = \text{bark}(F_{ij}) - L_j $$ $$ L_j = \text{bark}(F_{3j}) $$ ### Using the Bark Difference normalization functions #### On points ```{r} point_norm <- speaker_data |> norm_barkz( F1:F3, .by = speaker ) ``` #### On tracks ```{r} track_norm <- speaker_tracks |> norm_track_barkz( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) ``` #### On DCT Coefficients ```{r} dct_norm <- speaker_tracks |> mutate( across(F1:F3, hz_to_bark) ) |> reframe_with_dct( F1:F3, .by = speaker, .token_id_col = id, .time_col = t ) |> norm_dct_barkz( F1:F3, .by = speaker, .token_id_col = id, .param_col = .param ) ``` ## References