--- title: "A-short-introduction-to-KOFM" output: rmarkdown::html_vignette bibliography: ref.bib vignette: > %\VignetteIndexEntry{A-short-introduction-to-KOFM} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r, warning=F, message=FALSE, eval=TRUE} library(KOFM) ``` ## 1. A testing problem in tensor factor models A tensor time series is a series of tensor-valued [@KoldaBader] observations at each timestamp. The focus of 'KOFM' is to test if a given tensor time series $\mathcal{Y}_t$ for $t=1,2,\dots,T$ has a Kronecker product structure, i.e., $$ H_0: \{\mathcal{Y}_t\} \text{ has a Kronecker product structure};\\ H_1: \{\mathcal{Y}_t\} \text{ has no Kronecker product structure along a set of indices } \mathcal{A}. $$ As a simple non-trivial example, consider an order-2 tensor time series $\mathbf{Y}_t$ for $t=1,2,\dots,T$, i.e., a matrix time series, with the set $\mathcal{A}=\{1,2\}$. The above hypothesis test is equivalent to $$ H_0: \{\mathbf{Y}_t\} \text{ follows a matrix factor model such that } \mathbf{Y}_t= \mathbf{A}_1 \mathbf{F}_t \mathbf{A}_2 + \mathbf{E}_t ; \\ H_1: \text{Only the vectorised data } \{\textbf{vec}(\mathbf{Y}_t)\} \text{ has a factor structure such that } \textbf{vec}(\mathbf{Y}_t) = \mathbf{A}_V \mathbf{f}_t + \mathbf{e}_t . $$ Under both $H_0$ and $H_1$, the reshaped data follows a Tucker-decomposition tensor factor model. Hence we quickly demonstrate the tensor reshape operator, with the 'ten_reshape' and 'ten_reshape_back' functions, after which we will showcase how the test works using the 'testKron_A' and 'testKron_auto' functions. ## 2. Tensor reshape Let us initialise a simple order-3 tensor as follows, ```{r} simple_tensor <- array(1:24, dim=c(3,4,2)) ``` which can be reshaped along its modes 1 and 2, achieved by 'ten_reshape'. As we want to reshape the entire tensor, rather than a tensor time series with the first mode being the time mode, we set 'time.mode=FALSE' below; notice the order of mode indices in 'AA' does not matter, as the indices are in ascending order by convention. ```{r} cat('Reshape along modes 1 and 2:\n') ten_reshape(ten=simple_tensor, AA=c(1,2), time.mode=FALSE) cat('\nThe same results:\n') ten_reshape(ten=simple_tensor, AA=c(2,1), time.mode=FALSE) ``` Certainly we can also reshape our 'simple_tensor' along modes 2 and 3, or even along all modes 1, 2 and 3. ```{r} cat('Reshape along modes 2 and 3:\n') simple_tensor_23 <- ten_reshape(ten=simple_tensor, AA=c(2,3), time.mode=FALSE) simple_tensor_23 cat('\nReshape along modes 1, 2 and 3:\n') ten_reshape(ten=simple_tensor, AA=c(1,2,3), time.mode=FALSE) ``` If we read 'simple_tensor' as a time series with its mode-1 acting as the time mode, the series is an order-2 time series with $t=1,2,3$. In this case, we may only reshape along modes 1 and 2, obtaining a vector time series ```{r} cat('With the time series "simple_tensor", reshape along modes 1 and 2:\n') ten_reshape(ten=simple_tensor, AA=c(1,2), time.mode=TRUE) ``` \ Given a reshaped tensor (time series), we may reshape back and recover the original tensor (time series) given its original dimensions and the set of modes along which it reshapes. This is implemented by the 'ten_reshape_back' function. For example, using 'simple_tensor_23' which is the tensor resulted from reshaping our 'simple_reshape' along modes 2 and 3, we have ```{r} cat('Recover "simple_tensor" by reshaping back "simple_tensor_23":\n') simple_tensor_recover <- ten_reshape_back(ten=simple_tensor_23, AA=c(2,3), original.dim=dim(simple_tensor), time.mode=FALSE) simple_tensor_recover cat(paste0('\nTo see that "simple_tensor" is indeed recovered, the sum of their squared entry differences is: ', sum((simple_tensor_recover - simple_tensor)^2), '.')) ``` ## 3. An example (under the null) to perform the test To demonstrate the test, let us generate an example data using the 'tensorMiss' package. For details of the data generation mechanism, see @CenLam. For reproducibility, a seed parameter is required which we set as 2025. We generate an order-3 tensor time series $\mathcal{Y}_t \in \mathbb{R}^{10\times 10\times 10}$ for $t=1,2,\dots,100$.\ \ Under $H_0$, $\{\mathcal{Y}_t\}$ has a Kronecker product structure and hence follows a Tucker-decomposition tensor factor model. The time series is 'data_H0' as follows, ```{r} K <- 3 #order-3 tensor observed at each time TT <- 100 #time length d <- c(10,10,10) #spatial dimensions r <- c(2,2,2) #rank of core tensor re <- c(2,2,2) #rank of common component in error eta <- list(c(0,0), c(0,0), c(0,0)) #strong factors coef_f <- c(0.3) #AR(1) coefficients for core factor coef_fe <- c(-0.3) #AR(1) coefficients for common component in error coef_e <- c(0.5) #AR(1) coefficients for idiosyncratic part in error data_H0 <- tensorMiss::tensor_gen(K, TT, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$X ``` With the underlying Kronecker product structure, the null should not be rejected for any set of modes. For instance, we perform the test using 'testKron_A' below, given the set of mode indices $\mathcal{A}=\{1,2\}$. In particular, the parameter 'r' takes the true number of factors which can be estimated in practice. Since $(r_1,r_2,r_3)=(2,2,2)$, the rank for the reshaped data (where the last mode is the reshaped mode) is $(r_{\text{reshape},1}, r_V)=(2,4)$. Setting 'r.exact=FALSE' enables the test to attempt all possible rank combinations of $r_V$, which is $(1,4)$, $(2,2)$ and $(4,1)$. (Otherwise only the test using the parameter 'r' is performed.) ```{r} cat('Test "data_H0" along modes 1 and 2:\n') testKron_A(ten=data_H0, AA=c(1,2), r=r, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE) ``` The resulted list contains three matrices. The matrix 'rank' is the rank combinations attempted, which is ordered by all modes not in $\mathcal{A}$ followed by those modes in $\mathcal{A}$. Rows of the 'level' and 'decision' matrices correspond to the rank combinations in 'rank', while columns of the two matrices correspond to the results under different levels of 'alpha' in order. For the other two matrices,
'level'
Each result of 'level' reports an example test size; under $H_0$, the minimum of each column should be close to the 'alpha' level corresponding to the column.
'decision'
Each result of 'decision' reports the decision statistic (using a 'q'-quantile rule); under $H_0$, the minimum of each column should be upper bounded by the 'alpha' level corresponding to the column.
Since it is heuristic to determine if each 'level' column is close enough to the corresponding 'alpha', the test decision relies on the 'decision' matrix.\ \ The test can also be performed using the mode indices $\{1,3\}$, $\{2,3\}$ or $\{1,2,3\}$. We present the minimum of each 'decision' column and expect them to be no larger than 0.01 and 0.05, respectively. ```{r} cat('Test "data_H0" along modes 1 and 3:\n') apply(testKron_A(ten=data_H0, AA=c(1,3), r=r, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE)$decision, 2, min) cat('\nTest "data_H0" along modes 2 and 3:\n') apply(testKron_A(ten=data_H0, AA=c(2,3), r=r, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE)$decision, 2, min) cat('\nTest "data_H0" along modes 1, 2 and 3:\n') apply(testKron_A(ten=data_H0, AA=c(1,2,3), r=r, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE)$decision, 2, min) ``` With more modes 1, 2 and 3 to test, the result is less satisfying, which can be improved if more observations are available, as shown below. ```{r} TT_large <- 300 #larger time length data_H0_large <- tensorMiss::tensor_gen(K, TT_large, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$X cat('Test "data_H0_large" along modes 1, 2 and 3:\n') apply(testKron_A(ten=data_H0_large, AA=c(1,2,3), r=r, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE)$decision, 2, min) ``` \ For an order-$K$ tensor time series with $K\geq 3$, it is not so direct to specify the modes to test. When we have no prior information, 'testKron_auto' provides two options: 'all=TRUE' tests for all possible set of mode indices, and 'all=FALSE' runs an algorithm testing if each mode is in the set $\mathcal{A}$ along which the data has no Kronecker product structure. Moreover, the number of factors will be estimated based on eigenvalue ratio if 'r=0'. Similar to 'testKron_A', the resulted list contains three parts:
'decision'
A data frame indicating if each mode is in $\mathcal{A}$ according to each 'alpha' level if 'all' is FALSE, otherwise indicates if each test for a set of mode indices is rejected.
'level'
A data frame reporting an example test size according to each 'alpha' level.
'rank'
A vector reporting the specified/estimated rank for the parameter 'ten' used in the test.
For demonstration, we present the 'decision' data frames and 'rank': ```{r} H0_test_all <- testKron_auto(ten=data_H0, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=TRUE) H0_test_algo <- testKron_auto(ten=data_H0, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=FALSE) cat(paste0('Testing "data_H0" for all possible set of modes. The estimated rank is: ', H0_test_all$rank[1], ',', H0_test_all$rank[2], ',', H0_test_all$rank[3], '; the decision data frame is:\n')) print.data.frame(H0_test_all$decision) cat(paste0('\nTesting "data_H0" with the algorithm to identify each mode. The estimated rank is: ', H0_test_algo$rank[1], ',', H0_test_algo$rank[2], ',', H0_test_algo$rank[3], '; the decision data frame is:\n')) print.data.frame(H0_test_algo$decision) ``` Again, the test can be improved with more observations: ```{r} H0_test_all_large <- testKron_auto(ten=data_H0_large, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=TRUE) H0_test_algo_large <- testKron_auto(ten=data_H0_large, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=FALSE) cat(paste0('Testing "data_H0_large" for all possible set of modes. The estimated rank is: ', H0_test_all_large$rank[1], ',', H0_test_all_large$rank[2], ',', H0_test_all_large$rank[3], '; the decision data frame is:\n')) print.data.frame(H0_test_all_large$decision) cat(paste0('\nTesting "data_H0_large" with the algorithm to identify each mode. The estimated rank is: ', H0_test_algo_large$rank[1], ',', H0_test_algo_large$rank[2], ',', H0_test_algo_large$rank[3], '; the decision data frame is:\n')) print.data.frame(H0_test_algo_large$decision) ``` ## 4. Another example (under the alternative) Lastly, we demonstrate the performance of the test if the alternative is true. Suppose an observed order-3 tensor time series $\{\mathcal{Y}_t\}$ has no Kronecker product structure along $\{2,3\}$, i.e., $\{\mathcal{Y}_t\}$ does not follow a Tucker-decomposition factor model in general while the series reshaped along $\{2,3\}$ does. In this case, we generate the reshaped data and only the data reshaped back is observed; notice that the noise is still naturally order-3 tensor-valued which is weakly dependent across time and mode fibers. ```{r} noise_H1 <- tensorMiss::tensor_gen(K, TT, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$X - tensorMiss::tensor_gen(K, TT, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$C #noise series reshape_H1 <- tensorMiss::tensor_gen(K-1, TT, c(d[-c(2,3)],prod(d[c(2,3)])), c(r[-c(2,3)],prod(r[c(2,3)])), c(re[-c(2,3)],prod(re[c(2,3)])), eta[1:(K-1)], coef_f, coef_fe, coef_e, seed=2025) #common component for the reshaped data data_H1 <- ten_reshape_back(reshape_H1$C , c(2,3), c(TT, d)) + noise_H1 ``` \ Using 'testKron_auto', we present the 'decision' data frames and 'rank': ```{r} H1_test_all <- testKron_auto(ten=data_H1, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=TRUE) H1_test_algo <- testKron_auto(ten=data_H1, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=FALSE) cat(paste0('Testing "data_H1" for all possible set of modes. The estimated rank is: ', H1_test_all$rank[1], ',', H1_test_all$rank[2], ',', H1_test_all$rank[3], '; the decision data frame is:\n')) print.data.frame(H1_test_all$decision) cat(paste0('\nTesting "data_H1" with the algorithm to identify each mode. The estimated rank is: ', H1_test_algo$rank[1], ',', H1_test_algo$rank[2], ',', H1_test_algo$rank[3], '; the decision data frame is:\n')) print.data.frame(H1_test_algo$decision) ``` It worths to note the estimated ranks for modes 2 and 3, which are in fact invalid ranks due to the lost of Kronecker product structure along those modes. On the second result above, the algorithm incorrectly identifies mode-1, which is salvaged by more observations: ```{r} noise_H1_large <- tensorMiss::tensor_gen(K, TT_large, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$X - tensorMiss::tensor_gen(K, TT_large, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$C #noise series reshape_H1_large <- tensorMiss::tensor_gen(K-1, TT_large, c(d[-c(2,3)],prod(d[c(2,3)])), c(r[-c(2,3)],prod(r[c(2,3)])), c(re[-c(2,3)],prod(re[c(2,3)])), eta[1:(K-1)], coef_f, coef_fe, coef_e, seed=2025) #common component for the reshaped data data_H1_large <- ten_reshape_back(reshape_H1_large$C , c(2,3), c(TT_large, d)) + noise_H1_large H1_test_algo_large <- testKron_auto(ten=data_H1_large, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=FALSE) cat(paste0('Testing "data_H1_large" with the algorithm to identify each mode. The estimated rank is: ', H1_test_algo_large$rank[1], ',', H1_test_algo_large$rank[2], ',', H1_test_algo_large$rank[3], '; the decision data frame is:\n')) print.data.frame(H1_test_algo_large$decision) ``` ## References