--- author: "Joseph Larmarange" title: "About missing values: regular NAs, tagged NAs and user NAs" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{About missing values: regular NAs, tagged NAs and user NAs} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- In base **R**, missing values are indicated using the specific value `NA`. **Regular NAs** could be used with any type of vector (double, integer, character, factor, Date, etc.). Other statistical software have implemented ways to differentiate several types of missing values. **Stata** and **SAS** have a system of **tagged NAs**, where NA values are tagged with a letter (from a to z). **SPSS** allows users to indicate that certain non-missing values should be treated in some analysis as missing (**user NAs**). The `haven` package implements **tagged NAs** and **user NAs** in order to keep this information when importing files from **Stata**, **SAS** or **SPSS**. ```{r} library(labelled) ``` ## Tagged NAs ### Creation and tests **Tagged NAs** are proper `NA` values with a tag attached to them. They can be created with `tagged_na()`. The attached tag should be a single letter, lowercase (a-z) or uppercase (A-Z). ```{r} x <- c(1:5, tagged_na("a"), tagged_na("z"), NA) ``` For most **R** functions, tagged NAs are just considered as regular NAs. By default, they are just printed as any other regular NA. ```{r} x is.na(x) ``` To show/print their tags, you need to use `na_tag()`, `print_tagged_na()` or `format_tagged_na()`. ```{r} na_tag(x) print_tagged_na(x) format_tagged_na(x) ``` To test if a certain NA is a regular NA or a tagged NA, you should use `is_regular_na()` or `is_tagged_na()`. ```{r} is.na(x) is_tagged_na(x) # You can test for specific tagged NAs with the second argument is_tagged_na(x, "a") is_regular_na(x) ``` Tagged NAs could be defined **only** for double vectors. If you add a tagged NA to a character vector, it will be converted into a regular NA. If you add a tagged NA to an integer vector, the vector will be converted into a double vector. ```{r, error=TRUE} y <- c("a", "b", tagged_na("z")) y is_tagged_na(y) format_tagged_na(y) z <- c(1L, 2L, tagged_na("a")) typeof(z) format_tagged_na(z) ``` ### Unique values, duplicates and sorting with tagged NAs By default, functions such as `base::unique()`, `base::duplicated()`, `base::order()` or `base::sort()` will treat tagged NAs as the same thing as a regular NA. You can use `unique_tagged_na()`, `duplicated_tagged_na()`, `order_tagged_na()` and `sort_tagged_na()` as alternatives that will treat two tagged NAs with different tags as separate values. ```{r} x <- c(1, 2, tagged_na("a"), 1, tagged_na("z"), 2, tagged_na("a"), NA) x %>% print_tagged_na() unique(x) %>% print_tagged_na() unique_tagged_na(x) %>% print_tagged_na() duplicated(x) duplicated_tagged_na(x) sort(x, na.last = TRUE) %>% print_tagged_na() sort_tagged_na(x) %>% print_tagged_na() ``` ### Tagged NAs and value labels It is possible to define value labels for tagged NAs. ```{r} x <- c(1, 0, 1, tagged_na("r"), 0, tagged_na("d"), tagged_na("z"), NA) val_labels(x) <- c( no = 0, yes = 1, "don't know" = tagged_na("d"), refusal = tagged_na("r") ) x ``` When converting such labelled vector into factor, tagged NAs are, by default, converted into regular NAs (it is not possible to define tagged NAs with factors). ```{r} to_factor(x) ``` However, the option `explicit_tagged_na` of `to_factor()` allows to transform tagged NAs into explicit factor levels. ```{r} to_factor(x, explicit_tagged_na = TRUE) to_factor(x, levels = "prefixed", explicit_tagged_na = TRUE) ``` ### Conversion into user NAs Tagged NAs can be converted into user NAs with `tagged_na_to_user_na()`. ```{r} tagged_na_to_user_na(x) tagged_na_to_user_na(x, user_na_start = 10) ``` Use `tagged_na_to_regular_na()` to convert tagged NAs into regular NAs. ```{r} tagged_na_to_regular_na(x) tagged_na_to_regular_na(x) %>% is_tagged_na() ``` ## User NAs `haven` introduced an `haven_labelled_spss` class to deal with user defined missing values in a similar way as **SPSS**. In such case, additional attributes will be used to indicate with values should be considered as missing, but such values will not be stored as internal `NA` values. You should note that most R function will not take this information into account. Therefore, you will have to convert missing values into `NA` if required before analysis. These defined missing values could co-exist with internal `NA` values. ### Creation User NAs could be created directly with `labelled_spss()`. You can also manipulate them with `na_values()` and `na_range()`. ```{r} v <- labelled(c(1, 2, 3, 9, 1, 3, 2, NA), c(yes = 1, no = 3, "don't know" = 9)) v na_values(v) <- 9 v na_values(v) <- NULL v na_range(v) <- c(5, Inf) na_range(v) v ``` NB: you cant also use `set_na_range()` and `set_na_values()` for a `dplyr`-like syntax. ```{r} library(dplyr) # setting value labels and user NAs df <- tibble(s1 = c("M", "M", "F", "F"), s2 = c(1, 1, 2, 9)) %>% set_value_labels(s2 = c(yes = 1, no = 2)) %>% set_na_values(s2 = 9) df$s2 # removing user NAs df <- df %>% set_na_values(s2 = NULL) df$s2 ``` ### Tests Note that `is.na()` will return `TRUE` for user NAs. Use `is_user_na()` to test if a specific value is a user NA and `is_regular_na()` to test if it is a regular NA. ```{r} v is.na(v) is_user_na(v) is_regular_na(v) ``` ### Conversion For most **R** functions, user NAs values are **still** regular values. ```{r} x <- c(1:5, 11:15) na_range(x) <- c(10, Inf) val_labels(x) <- c("dk" = 11, "refused" = 15) x mean(x) ``` You can convert user NAs into regular NAs with `user_na_to_na()` or `user_na_to_regular_na()` (both functions are identical). ```{r} user_na_to_na(x) mean(user_na_to_na(x), na.rm = TRUE) ``` Alternatively, if the vector is numeric, you can convert user NAs into tagged NAs with `user_na_to_tagged_na()`. ```{r} user_na_to_tagged_na(x) mean(user_na_to_tagged_na(x), na.rm = TRUE) ``` Finally, you can also remove user NAs definition without converting these values to `NA`, using `remove_user_na()`. ```{r} remove_user_na(x) mean(remove_user_na(x)) ```