--- title: "Extended Installation Guide" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Extended Installation Guide} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` No one knows everything — and you don’t need to! Many of our users are not computer scientists, so some of the installation steps and technical terms may feel unfamiliar at first. That’s completely normal. This guide is written to walk you through the process step by step, with explanations along the way. And remember: search engines (like DuckDuckGo) and chatbots (like ChatGPT) can be helpful allies if you ever get stuck or encounter a new concept. If you run into issues that aren’t covered here, please reach out to us at: rtext.contact@gmail.com, so that we can improve the instructions for everyone. To get started, follow these steps:

1) Install R and RStudio
Use the links below to download and install:
- [R (version 4.0.0 or higher)](https://CRAN.R-project.org/)
- [RStudio (recommended)](https://posit.co/download/rstudio-desktop/)

2) Install and set up the text package
The text package gives you access to HuggingFace Transformers through [reticulate](https://rstudio.github.io/reticulate/) (enabling advanced language analysis), which connects R to [Python](https://www.python.org/) (while remaining in an R environment). It uses Python packages like torch and transformers.

To make this easy, run:
- `textrpp_install()` – installs a ready-to-use Python/Conda environment
- `textrpp_initialize()` – activates the environment for use in R

This setup will handle most Python and system dependencies automatically – however, you may be isntructed to install system level dependencis as further described below. ## Set up a python environment for the text-package ```{r extended_installation_condaenv, eval = FALSE} library(text) library(reticulate) # Install text-required Python packages in a Conda environment text::textrpp_install() # Show available Conda environments reticulate::conda_list() # Initialize the installed Conda environment text::textrpp_initialize(save_profile = TRUE) # Test that textEmbed works textEmbed("hello") ``` ## System-level dependencies Different operating systems require different system-level dependencies, especially when working with Python packages that rely on compiled extensions or specialized libraries. Below is a summary of what’s commonly required and how to install it. ### Windows #### Microsoft C++ Build Tools Some Python packages (e.g., hdbscan, flair) require compilation with Visual C++. Windows users may need to manually install Microsoft C++ Build Tools. You must have C++ build tools for Python packages that need compilation.

Install the latest version of Microsoft C++ Build Tools:
1. Download and run the installer from https://visualstudio.microsoft.com/visual-cpp-build-tools/.
2. During installation, check:
- "Desktop development with C++" or "C++ build tools".
- Ensure "Windows 11 SDK" is also selected on the right menu.
3. Complete installation and restart your computer.

#### Terms of Service conflict -- Annaconda forge channel Python environments used by the text and talk packages rely on packages from the conda-forge channel.
You don’t need to install Anaconda manually – the textrpp_install() function will install Miniconda and set conda-forge as the default channel.
However, if you’ve previously installed Anaconda or changed channels, you might encounter ToS conflicts (Terms of Service).
If errors mention pkgs/main or tos accept, you can fix it by running the following in Terminal/Command Prompt (not R):
```{terminal windows, eval = FALSE} # Run this in your Terminal (not in R) conda config --add channels conda-forge conda config --set channel_priority strict conda config --remove channels defaults ``` ### MacOS On macOS, most system-level dependencies are typically pre-installed. For any missing components, the text package automatically detects them and provides clear instructions to guide the user through installation. #### 1 Homebrew & libomp Install Homebrew (if not already installed): ```{sh macos1, eval = FALSE} # Run this in your Terminal (not in R) /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" ``` #### 2. Install libomp: ```{sh macos2, eval = FALSE} # Run this in your Terminal (not in R) brew install libomp ``` ### Ubuntu On recent Ubuntu distributions (e.g., 22.04+), most core dependencies are available. If needed, you can install them with: ```{sh ubuntu, eval = FALSE} # Run this in your Terminal (not in R) sudo apt update sudo apt install build-essential libomp-dev ``` build-essential: provides gcc, g++, and make
libomp-dev: for OpenMP support
## General troubleshooting ### 1. Check if you have install permissions Can you install an R package like dplyr? ```{r install_other_pacakge, eval = FALSE} install.packages("dplyr") ``` Can you install system-level tools like Python/miniconda? ```{r reticulate_installation, eval = FALSE} library(reticulate) reticulate::install_miniconda() ``` If you do not have premissions, please contact your administrator for advice. ### 2. Remembered to initialize the python environment. After restarting R, functions like textEmbed() can stop working again. Solution: Persist the initialization in your R profile: ```{r initialise, eval = FALSE} text::textrpp_initialize( condaenv = "textrpp_condaenv", refresh_settings = TRUE, save_profile = TRUE, ) ``` ### 3. Install the development version from GitHub. ```{r install_github, eval = FALSE} # install.packages("devtools") devtools::install_github("oscarkjell/text") ``` ### 4. Force reinstallation of the environment ```{r textrpp_install(), eval = FALSE} library(text) text::textrpp_install( update_conda = TRUE, force_conda = TRUE ) ``` ### 5. Install the python environment using reticulate See the article [Installing and Managing Python Environments with `reticulate`]( https://www.r-text.org/articles/reticulate.html), for detailed information. ### 6. Inspect diagnostic information If something isn't working right, it is a good start to examine what is installed and running on your system. ```{r textDiagnostics(), eval = FALSE} library(text) log <- text::textDiagnostics() log ``` Because the text package requires some system-level setup, installation is automatically verified on Windows, macOS, and Ubuntu through our [GitHub Actions](https://github.com/OscarKjell/text/actions). If you encounter any issues, please review the tests and check the workflow file for details on system-specific installations. To view the workflow file, select the three-dot menu on the right side of any GitHub Action run and choose View workflow file. This file specifies the operating systems, R versions, and additional libraries being tested. ## Virtual environments It is also possible to use virtual environments (although it is currently only tested on MacOS). ```{r extended_installation_venv, eval = FALSE} # Create a virtual environment with text required python packages. # Note that you have to provide a python path. text::textrpp_install_virtualenv(rpp_version = c("torch==1.7.1", "transformers==4.12.5", "numpy", "nltk"), python_path = "/usr/local/bin/python3.9", envname = "textrpp_virtualenv") # Initialize the virtual environment. text::textrpp_initialize(virtualenv = "textrpp_virtualenv", condaenv = NULL, save_profile = TRUE) ``` ## Solving OMP errors and R/Rstudio crashes Some macOS users may experience a crash when running functions like textEmbed() from the text package. This is due to a known conflict between multiple OpenMP libraries (e.g., libomp.dylib and libiomp5.dylib) used by Python packages such as torch and transformers. Workaround (Automatically Applied) To prevent this crash, the package sets the following environment variables when running on macOS: ``` Sys.setenv(OMP_NUM_THREADS = "1") Sys.setenv(OMP_MAX_ACTIVE_LEVELS = "1") Sys.setenv(KMP_DUPLICATE_LIB_OK = "TRUE") ``` These settings: - Limit the number of OpenMP threads - Avoid nested threading issues - Instruct macOS to ignore duplicate OpenMP libraries This workaround is safe for most users and enables smooth functionality, but note that: - It may slightly reduce parallel processing performance. - It bypasses a system-level issue rather than solving it permanently. The text package sets OpenMP-related environment variables for compatibility with PyTorch to avoid crashes due to libomp.dylib conflicts. You can skip this behavior by setting: ``` Sys.setenv(TEXT_SKIP_OMP_PATCH = "TRUE") library(text) ``` The exact way to install these packages may differ across systems. Please see:\ [Python](https://www.python.org/)\ [torch](https://pytorch.org/)\ [transformers](https://huggingface.co/docs/transformers/index)