--- title: "Getting Started with cyclicwave" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with cyclicwave} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` ## Overview A modular toolkit for clustering time series data and detecting anomalies using classical, wavelet-based, Hilbert-based, and circular feature extraction methods. It supports DBSCAN, OPTICS clustering with consistent output formats and provides a comparison function that allows users to compare multiple feature/algorithm combinations with a single call. We use the bundled `power_consumption` dataset, recorded at 10-minute intervals across three urban zones in Tetouan, Morocco. ```{r setup} library(cyclicwave) data(power_consumption) ``` ## The data Each row is a single time point. The last three columns are the zone-wise power consumption signals; the rest are weather variables we will ignore in this example. ```{r} dim(power_consumption) head(power_consumption, 3) ``` For this walkthrough we will work with a 1000-row slice to keep everything fast. The exact same code runs on the full dataset; it just takes longer. ```{r} pwr <- power_consumption[1:1000, ] zones_matrix <- as.matrix(pwr[, 7:9]) ``` ## Step 1: reshape into long format DBSCAN expects a 2D matrix where each row is one observation. We flatten it and attach a zone identifier per row. ```{r} flat <- flatten_with_zones(zones_matrix) length(flat$values) table(flat$zones) ``` After this step we have a single long vector with 3000 values and a matching `zones` vector of identifiers. ## Step 2: extract rolling features Each observation needs more than a single value to be informative. We compute rolling mean and standard deviation over a 10-point window ```{r} rolling <- rolling_stats(zones_matrix, window_size = 10, stats = c("mean", "sd")) ``` `rolling_stats` returns a list of matrices. We flatten each to align with our long-format values. ```{r} raw_features <- cbind( zone = flat$zones, value = flat$values, mavg = as.vector(rolling$mean), sd = as.vector(rolling$sd) ) head(raw_features, 3) ``` The first column is the zone identifier; it is metadata, not a feature. We will exclude it from clustering and normalization. ## Step 3: normalize DBSCAN is distance-based, so feature scales matter. ```{r} raw_features[, 2:4] <- normalize_features(raw_features[, 2:4], method = "zscore") ``` ## Step 4: choose epsilon (visual heuristic) DBSCAN needs an `eps` parameter: the neighborhood radius. The k-distance plot is the standard visual heuristic. We look for an elbow in the sorted distances curve. ```{r kdist-plot} plot_k_distance(raw_features[, 2:4], k = 7) ``` ## Step 5: run DBSCAN ```{r} result <- run_dbscan(raw_features[, 2:4], eps = 0.3, min_pts = 7) result$n_clusters result$n_noise ``` The result is a list with a standardized structure. ## Step 6: evaluate The Davies-Bouldin Index summarizes how compact and separated the clusters are. Lower values are better. ```{r} davies_bouldin(raw_features[, 2:4], result$cluster) ``` We can visualize the partition by projecting onto the first two principal components and coloring by cluster. ```{r cluster-plot} plot_clusters_pca(raw_features[, 2:4], result$cluster) ``` For function-level reference, see the help pages, e.g. `?run_dbscan`.