--- title: "Micro files" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Micro files} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Use case `px_micro()` exists to support a specific use case for [Statistics Greenland](https://stat.gl/default.asp?lang=en). They use it to create small PX-files to showcase and present metadata from a lager data set which cannot be made publicly available. See an example on [Statistic Greenland's microdata for Research and Analysis](https://bank.stat.gl/pxweb/en/GSmicro/). ## `px_micro()` Apart from `px_save()`, `px_micro()` is the only other function that can save px objects as PX-files. `px_micro()` turns a px object into many smaller PX-files, each containing a subset of the variables in the original px object. ## Input data The basis of micro files are usually a data set which doesn't have a count variable (like most PX-files). `px_micro()` will instead create a count of each individual variable. In this example we will use the built-in data data set `greenlanders`. ```{r, include = FALSE} set.seed(0) micro_dir <- file.path("micro_files") unlink(micro_dir, recursive = TRUE) ``` ```{r} library(pxmake) greenlanders |> dplyr::sample_n(10) |> dplyr::arrange_all() ``` ## How to create micro files Create a px object with `px()`, and pass it to `px_micro()`. ```{r} # Create px object x <- px(greenlanders) # Create folder for micro files micro_dir <- file.path("micro_files") dir.create(micro_dir) # Write micro files to folder px_micro(x, out_dir = micro_dir) ``` The folder now contains three PX-files, one for each variable except 'age'. ```{r} list.files(micro_dir) ``` The reason 'age' didn't get a PX-file is because it is the HEADING variable in `x`, and `px_micro()` creates a file for each non-HEADING variable. Instead the HEADNING variable is used in all the created PX-files. ```{r} # Print HEADING variables px_heading(x) # Print non-HEADING variables c(px_stub(x), px_figures(x)) ``` In this case, we want 'cohort' to be heading, and to create a PX-file for 'gender', 'age' and 'municipality'. ```{r} x2 <- x |> px_stub('age') |> # Change age to STUB px_heading('cohort') # Change cohort to HEADING ``` ```{r} # Clear folder unlink(file.path(micro_dir, "*.px")) px_micro(x2, out_dir = micro_dir) ``` The folder now contains the files we wanted. ```{r} list.files(micro_dir) ``` Each file contains one of the three variables as STUB, 'cohort' as HEADING, and a variable 'n' which is the count of each combination of the variables. ```{r} px(file.path(micro_dir, 'age.px'))$data px(file.path(micro_dir, 'gender.px'))$data px(file.path(micro_dir, 'municipality.px'))$data ``` ## Keyword values In general the keyword values from the px object are carried over to the micro files. This is the case for keywords like 'MATRIX', 'SUBJECT-CODE', 'CONTACT', 'LANGUAGE', 'CODEPAGE', etc. To change keywords across all the micro files, the easiest is to change them in the px object before calling `px_micro()`. ```{r, eval = FALSE} # Change CONTACT in all micro files x2 |> px_contact("Johan Ejstrud") |> px_micro(out_dir = micro_dir) ``` However, some keywords need to be changed individually for each micro file. To do so, create a data frame with the column 'variable' and a column for each px keyword to change. ```{r} individual_keywords <- tibble::tribble(~variable , ~px_description, "age" , "Age count 18-99", "gender" , "Gender count", "municipality", "Municipality 2024" ) ``` Supply this dataframe to the `keyword_values` argument of `px_micro()`. ```{r} px_micro(x2, out_dir = micro_dir, keyword_values = individual_keywords) ``` DESCRIPTION is changed in the micro files: ```{r} px(file.path(micro_dir, 'age.px')) %>% px_description() px(file.path(micro_dir, 'gender.px')) %>% px_description() px(file.path(micro_dir, 'municipality.px')) %>% px_description() ``` ### Multilingual files For multilingual files add a 'language' column to `keyword_values`. ```{r} x3 <- x2 |> px_language("en") |> px_languages(c("en", "kl")) individual_keywords_ml <- tibble::tribble( ~variable, ~language, ~px_description, ~px_matrix, "age", "en", "Age count 18-99", "AGE", "age", "kl", "Ukiut 18-99", NA, "gender", "en", "Gender count", "GEN", "gender", "kl", " Suiaassuseq", NA, "municipality", "en", "Municipality 2024", "MUN", "municipality", "kl", "Kommuni 2024", NA ) px_micro(x3, out_dir = micro_dir, keyword_values = individual_keywords_ml) ``` Here 'px_description' varies for each language, and 'px_matrix' is only set for one of the languages, since it is not a language dependent keywords. For language independant keywords it doesn't matter which language the value is set for. ### Filenames The filenames of the micro files are by default the name of the variable, however these can also be changed by passing a 'filename' column to 'keyword_values' ```{r} individual_keywords2 <- individual_keywords |> dplyr::mutate(filename = paste0(variable, "_2024", ".px")) # Clear folder unlink(file.path(micro_dir, "*.px")) px_micro(x2, out_dir = micro_dir, keyword_values = individual_keywords2) list.files(micro_dir) ```