---
title: "Micro files"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Micro files}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Use case
`px_micro()` exists to support a specific use case for [Statistics Greenland](https://stat.gl/default.asp?lang=en).
They use it to create small PX-files to showcase and present metadata from a lager data set which cannot be made publicly available. See an example on [Statistic Greenland's microdata for Research and Analysis](https://bank.stat.gl/pxweb/en/GSmicro/).
## `px_micro()`
Apart from `px_save()`, `px_micro()` is the only other function that can save px objects as PX-files.
`px_micro()` turns a px object into many smaller PX-files, each containing a subset of the variables in the original px object.
## Input data
The basis of micro files are usually a data set which doesn't have a count variable (like most PX-files). `px_micro()` will instead create a count of each individual variable.
In this example we will use the built-in data data set `greenlanders`.
```{r, include = FALSE}
set.seed(0)
micro_dir <- file.path("micro_files")
unlink(micro_dir, recursive = TRUE)
```
```{r}
library(pxmake)
greenlanders |> dplyr::sample_n(10) |> dplyr::arrange_all()
```
## How to create micro files
Create a px object with `px()`, and pass it to `px_micro()`.
```{r}
# Create px object
x <- px(greenlanders)
# Create folder for micro files
micro_dir <- file.path("micro_files")
dir.create(micro_dir)
# Write micro files to folder
px_micro(x, out_dir = micro_dir)
```
The folder now contains three PX-files, one for each variable except 'age'.
```{r}
list.files(micro_dir)
```
The reason 'age' didn't get a PX-file is because it is the HEADING variable in `x`, and `px_micro()` creates a file for each non-HEADING variable. Instead the HEADNING variable is used in all the created PX-files.
```{r}
# Print HEADING variables
px_heading(x)
# Print non-HEADING variables
c(px_stub(x), px_figures(x))
```
In this case, we want 'cohort' to be heading, and to create a PX-file for 'gender', 'age' and 'municipality'.
```{r}
x2 <-
x |>
px_stub('age') |> # Change age to STUB
px_heading('cohort') # Change cohort to HEADING
```
```{r}
# Clear folder
unlink(file.path(micro_dir, "*.px"))
px_micro(x2, out_dir = micro_dir)
```
The folder now contains the files we wanted.
```{r}
list.files(micro_dir)
```
Each file contains one of the three variables as STUB, 'cohort' as HEADING, and a variable 'n' which is the count of each combination of the variables.
```{r}
px(file.path(micro_dir, 'age.px'))$data
px(file.path(micro_dir, 'gender.px'))$data
px(file.path(micro_dir, 'municipality.px'))$data
```
## Keyword values
In general the keyword values from the px object are carried over to the micro files. This is the case for keywords like 'MATRIX', 'SUBJECT-CODE', 'CONTACT', 'LANGUAGE', 'CODEPAGE', etc.
To change keywords across all the micro files, the easiest is to change them in the px object before calling `px_micro()`.
```{r, eval = FALSE}
# Change CONTACT in all micro files
x2 |>
px_contact("Johan Ejstrud") |>
px_micro(out_dir = micro_dir)
```
However, some keywords need to be changed individually for each micro file. To do so, create a data frame with the column 'variable' and a column for each px keyword to change.
```{r}
individual_keywords <- tibble::tribble(~variable , ~px_description,
"age" , "Age count 18-99",
"gender" , "Gender count",
"municipality", "Municipality 2024"
)
```
Supply this dataframe to the `keyword_values` argument of `px_micro()`.
```{r}
px_micro(x2, out_dir = micro_dir, keyword_values = individual_keywords)
```
DESCRIPTION is changed in the micro files:
```{r}
px(file.path(micro_dir, 'age.px')) %>% px_description()
px(file.path(micro_dir, 'gender.px')) %>% px_description()
px(file.path(micro_dir, 'municipality.px')) %>% px_description()
```
### Multilingual files
For multilingual files add a 'language' column to `keyword_values`.
```{r}
x3 <-
x2 |>
px_language("en") |>
px_languages(c("en", "kl"))
individual_keywords_ml <-
tibble::tribble(
~variable, ~language, ~px_description, ~px_matrix,
"age", "en", "Age count 18-99", "AGE",
"age", "kl", "Ukiut 18-99", NA,
"gender", "en", "Gender count", "GEN",
"gender", "kl", " Suiaassuseq", NA,
"municipality", "en", "Municipality 2024", "MUN",
"municipality", "kl", "Kommuni 2024", NA
)
px_micro(x3, out_dir = micro_dir, keyword_values = individual_keywords_ml)
```
Here 'px_description' varies for each language, and 'px_matrix' is only set for one of the languages, since it is not a language dependent keywords. For language independant keywords it doesn't matter which language the value is set for.
### Filenames
The filenames of the micro files are by default the name of the variable, however these can also be changed by passing a 'filename' column to 'keyword_values'
```{r}
individual_keywords2 <-
individual_keywords |>
dplyr::mutate(filename = paste0(variable, "_2024", ".px"))
# Clear folder
unlink(file.path(micro_dir, "*.px"))
px_micro(x2, out_dir = micro_dir, keyword_values = individual_keywords2)
list.files(micro_dir)
```