---
title: "dfMaker Details"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{dfMaker Details}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, include = FALSE}
library(multimolang)
```

## Introduction

The `dfMaker()` function processes and organizes keypoints data generated by **'OpenPose'**. Additionally, the function applies a linear transformation to the original coordinates provided by 'OpenPose', enabling analysis in a custom coordinate system defined by the user.

This transformation includes selecting a reference point as the origin and two additional keypoints to define the new base vectors. This is particularly useful for aligning and scaling pose data across different contexts or frames.

The `dfMaker()` function is designed to handle large-scale datasets efficiently, providing structured outputs in formats like Parquet and CSV for seamless integration into data pipelines.

### Parameter Explanation

- **`input.folder`**: Path to the folder containing 'OpenPose' JSON files.
- **`config.path`**: Path to the configuration file for extracting metadata (optional).
- **`output.file`**: Name of the output file.
- **`output.path`**: Directory where the output file will be saved.
- **`no_save`**: If `TRUE`, does not save the result to a file.
- **`fast_scaling`**: If `TRUE`, uses fast scaling for transformation.
- **`transformation_coords`**: Numeric vector of length 4 specifying the transformation coordinates.

## Additional Details

The function depends on the **`arrow`** package for efficient reading and writing of JSON and Parquet files. Make sure you have it installed:

```{r install_arrow, eval = FALSE}
install.packages("arrow")
```

### Linear Transformation in `dfMaker()`

The `dfMaker()` function applies a linear transformation to normalize and align keypoints data within a custom coordinate system. This transformation standardizes poses across different frames or individuals by defining specific keypoints as the origin and base vectors.

#### Fast Scaling Mode (`fast_scaling = TRUE`)

When `fast_scaling = TRUE`, the transformation is simplified and uses only the **pose keypoints** (the first set of keypoints). This results in faster computation since it avoids extra references.

**Steps:**

1. **Define the Origin (`o_point`):**

   - Select a keypoint to serve as the origin $(0, 0)$ in the new coordinate system.
   - Denote the coordinates of the origin as $(x_{\text{origin}}, y_{\text{origin}})$.

2. **Calculate the Primary Base Vector ($\mathbf{v_i}$):**

   - Choose a keypoint (`i_point`) to define the primary base vector.
   - Compute:
     $$
     \mathbf{v_i} = (x_i, y_i) - (x_{\text{origin}}, y_{\text{origin}})
     $$
     where $(x_i, y_i)$ are the coordinates of `i_point`.

3. **Compute the Scaling Factor (`s`):**

   - The scaling factor is the x-component of $\mathbf{v_i}$:
     $$
     s = v_{i,x}
     $$
   - This factor scales the keypoints along the x-axis.

4. **Apply the Transformation:**

   - For each keypoint $(x, y)$, compute the transformed coordinates:
   
     $$
     x' = \frac{x - x_{\text{origin}}}{s}, \quad y' = -\frac{y - y_{\text{origin}}}{s}
     $$
     
   - The y-coordinate is negated to adjust for coordinate system differences (e.g., image coordinates have the y-axis pointing downwards).

**Summary Equation:**

$$
\begin{cases}
x' = \dfrac{x - x_{\text{origin}}}{s} \\
y' = -\dfrac{y - y_{\text{origin}}}{s}
\end{cases}
$$


#### Full Transformation Mode (`fast_scaling = FALSE`)

When `fast_scaling = FALSE`, the transformation uses both primary and secondary base vectors to perform a full affine transformation, which can handle rotations and scaling in both axes.

**Additional Steps:**

1. **Calculate the Secondary Base Vector ($\mathbf{v_j}$):**

   - If `i_point` and `j_point` are different:
   
     $$
     \mathbf{v_j} = (x_j, y_j) - (x_{\text{origin}}, y_{\text{origin}})
     $$
     
   - If `i_point` and `j_point` are the same (to maintain orthogonality):
   
     $$
     \mathbf{v_j} = (-v_{i,y}, v_{i,x})
     $$
     
     This computes a vector perpendicular to $\mathbf{v_i}$.

2. **Construct the Transformation Matrix ($M_t$):**

   $$
   M_t = \begin{pmatrix} v_{i,x} & v_{j,x} \\ v_{i,y} & v_{j,y} \end{pmatrix}
   $$

3. **Compute the Inverse Transformation Matrix ($M_t^{-1}$) Using Cramer's Rule:**

   - **Determinant:**
   
     $$
     \det(M_t) = v_{i,x} \cdot v_{j,y} - v_{j,x} \cdot v_{i,y}
     $$
     
   - **Inverse Matrix:**
   
     $$
     M_t^{-1} = \frac{1}{\det(M_t)} \begin{pmatrix} v_{j,y} & -v_{j,x} \\ -v_{i,y} & v_{i,x} \end{pmatrix}
     $$

4. **Apply the Transformation:**

   - For each keypoint $(x, y)$, compute the relative position:
   
     $$
     \begin{pmatrix} x_{\text{rel}} \\ y_{\text{rel}} \end{pmatrix} = \begin{pmatrix} x - x_{\text{origin}} \\ y - y_{\text{origin}} \end{pmatrix}
   $$

   - Transform the coordinates:
   
     $$
     \begin{pmatrix} x' \\ y' \end{pmatrix} = M_t^{-1} \cdot \begin{pmatrix} x_{\text{rel}} \\ y_{\text{rel}} \end{pmatrix}
     $$

#### Example Applied in the Function

Using `transformation_coords = c(1, 1, 5, 5)` and `fast_scaling = TRUE`:

- **Origin (`o_point = 1`):** Keypoint index 1.
- **Primary Base Vector (`i_point = 5`):** Keypoint index 5.
- **Secondary Base Vector (`j_point = 5`):** Same as `i_point`, so $\mathbf{v_j}$ is perpendicular to $\mathbf{v_i}$.

$$
\begin{cases}
x' = \dfrac{x - x_{\text{origin}}}{v_{i,x}} \\
y' = -\dfrac{y - y_{\text{origin}}}{v_{i,x}}
\end{cases}
$$

**Implications:**

- The x-coordinate is scaled to establish a unit length along the x-axis.
- The y-coordinate is scaled and inverted to maintain proportion and adjust for coordinate orientation.

## Error Handling

- If the JSON files do not have the expected format or are empty, the function will skip them and display a message.
- If `output.file` is `NULL` and multiple unique `id` values are found, the function will generate an error requesting an explicit file name.

## Notes on NewsScape Data

If you are working with data from the **UCLA NewsScape** archive, you can use the `config.path` parameter to specify how to extract metadata from the filenames.

**Example Configuration File (`config.json`):**

```json
{
    "extract_datetime": true,
    "extract_time": true,
    "extract_exp_search": true,
    "extract_country_code": true,
    "extract_network_code": true,
    "extract_program_name": true,
    "extract_time_range": true,
    "timezone": "America/Los_Angeles"
}
```

## Example

```{r full_example, eval = FALSE}
# Define paths to example data included with the package
input_folder <- system.file("extdata/eg/o1", package = "multimolang")
output_file <- file.path(tempdir(), "processed_data.csv")
output_path <- tempdir()  # Use a temporary directory for writing output during examples

# Run dfMaker()
df <- dfMaker(
  input.folder = input_folder,
  output.file = output_file,
  output.path = output_path,
  no_save = FALSE,
  fast_scaling = TRUE,
  transformation_coords = c(1, 1, 5, 5)
)

# View the first few rows of the resulting data frame
head(df)
```

## Conclusion

The`dfMaker()` function simplifies the processing of large volumes of keypoints data, enabling easier integration into data analysis and machine learning workflows. By providing both fast scaling and full transformation modes, it offers flexibility in how keypoints are normalized and aligned, catering to various research needs in different field studies. Whether working with extensive datasets from sources like the UCLA 'NewsScape' archive or integrating into complex data pipelines, `dfMaker()` ensures that keypoints data is consistently and efficiently prepared for subsequent analysis.

## References

1. **'OpenPose'**:
   Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., & Sheikh, Y. (2019). "OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields." *IEEE Transactions on Pattern Analysis and Machine Intelligence*, 43(1), 172–186. <https://doi.org/10.1109/TPAMI.2019.2929257>

2. **NewsScape and the Distributed Little Red Hen Lab**:
   Uhrig, P. (2018). "NewsScape and the Distributed Little Red Hen Lab: A Digital Infrastructure for the Large-Scale Analysis of TV Broadcasts." In *Anglistentag 2017 in Regensburg: Proceedings*, pp. 99–114. Wissenschaftlicher Verlag Trier.

3. **Apache Arrow**:
   Apache Arrow (2020). "Apache Arrow: A Cross-Language Development Platform for In-Memory Data." <https://arrow.apache.org/>