library(KOFM)
A tensor time series is a series of tensor-valued (Kolda and Bader 2009) observations at each timestamp. The focus of ‘KOFM’ is to test if a given tensor time series \(\mathcal{Y}_t\) for \(t=1,2,\dots,T\) has a Kronecker product structure, i.e., \[ H_0: \{\mathcal{Y}_t\} \text{ has a Kronecker product structure};\\ H_1: \{\mathcal{Y}_t\} \text{ has no Kronecker product structure along a set of indices } \mathcal{A}. \]
As a simple non-trivial example, consider an order-2 tensor time series \(\mathbf{Y}_t\) for \(t=1,2,\dots,T\), i.e., a matrix time series, with the set \(\mathcal{A}=\{1,2\}\). The above hypothesis test is equivalent to \[ H_0: \{\mathbf{Y}_t\} \text{ follows a matrix factor model such that } \mathbf{Y}_t= \mathbf{A}_1 \mathbf{F}_t \mathbf{A}_2 + \mathbf{E}_t ; \\ H_1: \text{Only the vectorised data } \{\textbf{vec}(\mathbf{Y}_t)\} \text{ has a factor structure such that } \textbf{vec}(\mathbf{Y}_t) = \mathbf{A}_V \mathbf{f}_t + \mathbf{e}_t . \]
Under both \(H_0\) and \(H_1\), the reshaped data follows a Tucker-decomposition tensor factor model. Hence we quickly demonstrate the tensor reshape operator, with the ‘ten_reshape’ and ‘ten_reshape_back’ functions, after which we will showcase how the test works using the ‘testKron_A’ and ‘testKron_auto’ functions.
Let us initialise a simple order-3 tensor as follows,
<- array(1:24, dim=c(3,4,2)) simple_tensor
which can be reshaped along its modes 1 and 2, achieved by ‘ten_reshape’. As we want to reshape the entire tensor, rather than a tensor time series with the first mode being the time mode, we set ‘time.mode=FALSE’ below; notice the order of mode indices in ‘AA’ does not matter, as the indices are in ascending order by convention.
cat('Reshape along modes 1 and 2:\n')
#> Reshape along modes 1 and 2:
ten_reshape(ten=simple_tensor, AA=c(1,2), time.mode=FALSE)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
#> [1,] 1 2 3 4 5 6 7 8 9 10 11 12
#> [2,] 13 14 15 16 17 18 19 20 21 22 23 24
cat('\nThe same results:\n')
#>
#> The same results:
ten_reshape(ten=simple_tensor, AA=c(2,1), time.mode=FALSE)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
#> [1,] 1 2 3 4 5 6 7 8 9 10 11 12
#> [2,] 13 14 15 16 17 18 19 20 21 22 23 24
Certainly we can also reshape our ‘simple_tensor’ along modes 2 and 3, or even along all modes 1, 2 and 3.
cat('Reshape along modes 2 and 3:\n')
#> Reshape along modes 2 and 3:
<- ten_reshape(ten=simple_tensor, AA=c(2,3), time.mode=FALSE)
simple_tensor_23
simple_tensor_23#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1 4 7 10 13 16 19 22
#> [2,] 2 5 8 11 14 17 20 23
#> [3,] 3 6 9 12 15 18 21 24
cat('\nReshape along modes 1, 2 and 3:\n')
#>
#> Reshape along modes 1, 2 and 3:
ten_reshape(ten=simple_tensor, AA=c(1,2,3), time.mode=FALSE)
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
If we read ‘simple_tensor’ as a time series with its mode-1 acting as the time mode, the series is an order-2 time series with \(t=1,2,3\). In this case, we may only reshape along modes 1 and 2, obtaining a vector time series
cat('With the time series "simple_tensor", reshape along modes 1 and 2:\n')
#> With the time series "simple_tensor", reshape along modes 1 and 2:
ten_reshape(ten=simple_tensor, AA=c(1,2), time.mode=TRUE)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1 4 7 10 13 16 19 22
#> [2,] 2 5 8 11 14 17 20 23
#> [3,] 3 6 9 12 15 18 21 24
Given a reshaped tensor (time series), we may reshape back and recover
the original tensor (time series) given its original dimensions and the
set of modes along which it reshapes. This is implemented by the
‘ten_reshape_back’ function. For example, using ‘simple_tensor_23’ which
is the tensor resulted from reshaping our ‘simple_reshape’ along modes 2
and 3, we have
cat('Recover "simple_tensor" by reshaping back "simple_tensor_23":\n')
#> Recover "simple_tensor" by reshaping back "simple_tensor_23":
<- ten_reshape_back(ten=simple_tensor_23, AA=c(2,3), original.dim=dim(simple_tensor), time.mode=FALSE)
simple_tensor_recover
simple_tensor_recover#> , , 1
#>
#> [,1] [,2] [,3] [,4]
#> [1,] 1 4 7 10
#> [2,] 2 5 8 11
#> [3,] 3 6 9 12
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4]
#> [1,] 13 16 19 22
#> [2,] 14 17 20 23
#> [3,] 15 18 21 24
cat(paste0('\nTo see that "simple_tensor" is indeed recovered, the sum of their squared entry differences is: ', sum((simple_tensor_recover - simple_tensor)^2), '.'))
#>
#> To see that "simple_tensor" is indeed recovered, the sum of their squared entry differences is: 0.
To demonstrate the test, let us generate an example data using the
‘tensorMiss’ package. For details of the data generation mechanism, see
Cen and Lam (2024). For reproducibility, a
seed parameter is required which we set as 2025. We generate an order-3
tensor time series \(\mathcal{Y}_t \in
\mathbb{R}^{10\times 10\times 10}\) for \(t=1,2,\dots,100\).
Under \(H_0\), \(\{\mathcal{Y}_t\}\) has a Kronecker product
structure and hence follows a Tucker-decomposition tensor factor model.
The time series is ‘data_H0’ as follows,
<- 3 #order-3 tensor observed at each time
K <- 100 #time length
TT <- c(10,10,10) #spatial dimensions
d <- c(2,2,2) #rank of core tensor
r <- c(2,2,2) #rank of common component in error
re <- list(c(0,0), c(0,0), c(0,0)) #strong factors
eta <- c(0.3) #AR(1) coefficients for core factor
coef_f <- c(-0.3) #AR(1) coefficients for common component in error
coef_fe <- c(0.5) #AR(1) coefficients for idiosyncratic part in error
coef_e <- tensorMiss::tensor_gen(K, TT, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$X data_H0
With the underlying Kronecker product structure, the null should not be rejected for any set of modes. For instance, we perform the test using ‘testKron_A’ below, given the set of mode indices \(\mathcal{A}=\{1,2\}\). In particular, the parameter ‘r’ takes the true number of factors which can be estimated in practice. Since \((r_1,r_2,r_3)=(2,2,2)\), the rank for the reshaped data (where the last mode is the reshaped mode) is \((r_{\text{reshape},1}, r_V)=(2,4)\). Setting ‘r.exact=FALSE’ enables the test to attempt all possible rank combinations of \(r_V\), which is \((1,4)\), \((2,2)\) and \((4,1)\). (Otherwise only the test using the parameter ‘r’ is performed.)
cat('Test "data_H0" along modes 1 and 2:\n')
#> Test "data_H0" along modes 1 and 2:
testKron_A(ten=data_H0, AA=c(1,2), r=r, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE)
#> $level
#> [,1] [,2]
#> [1,] 0.04 0.12
#> [2,] 0.01 0.06
#> [3,] 0.01 0.08
#>
#> $decision
#> [,1] [,2]
#> [1,] 0.0100 0.0495
#> [2,] 0.0095 0.0400
#> [3,] 0.0100 0.0400
#>
#> $rank
#> [,1] [,2] [,3]
#> [1,] 2 1 4
#> [2,] 2 2 2
#> [3,] 2 4 1
Since it is heuristic to determine if each ‘level’ column is close
enough to the corresponding ‘alpha’, the test decision relies on the
‘decision’ matrix.
The test can also be performed using the mode indices \(\{1,3\}\), \(\{2,3\}\) or \(\{1,2,3\}\). We present the minimum of each
‘decision’ column and expect them to be no larger than 0.01 and 0.05,
respectively.
cat('Test "data_H0" along modes 1 and 3:\n')
#> Test "data_H0" along modes 1 and 3:
apply(testKron_A(ten=data_H0, AA=c(1,3), r=r, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE)$decision, 2, min)
#> [1] 0.01 0.04
cat('\nTest "data_H0" along modes 2 and 3:\n')
#>
#> Test "data_H0" along modes 2 and 3:
apply(testKron_A(ten=data_H0, AA=c(2,3), r=r, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE)$decision, 2, min)
#> [1] 0.01 0.04
cat('\nTest "data_H0" along modes 1, 2 and 3:\n')
#>
#> Test "data_H0" along modes 1, 2 and 3:
apply(testKron_A(ten=data_H0, AA=c(1,2,3), r=r, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE)$decision, 2, min)
#> [1] 0.01 0.06
With more modes 1, 2 and 3 to test, the result is less satisfying, which can be improved if more observations are available, as shown below.
<- 300 #larger time length
TT_large <- tensorMiss::tensor_gen(K, TT_large, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$X
data_H0_large cat('Test "data_H0_large" along modes 1, 2 and 3:\n')
#> Test "data_H0_large" along modes 1, 2 and 3:
apply(testKron_A(ten=data_H0_large, AA=c(1,2,3), r=r, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE)$decision, 2, min)
#> [1] 0.01000000 0.04983333
For demonstration, we present the ‘decision’ data frames and ‘rank’:
<- testKron_auto(ten=data_H0, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=TRUE)
H0_test_all <- testKron_auto(ten=data_H0, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=FALSE)
H0_test_algo cat(paste0('Testing "data_H0" for all possible set of modes. The estimated rank is: ', H0_test_all$rank[1], ',', H0_test_all$rank[2], ',', H0_test_all$rank[3], '; the decision data frame is:\n'))
#> Testing "data_H0" for all possible set of modes. The estimated rank is: 2,2,2; the decision data frame is:
print.data.frame(H0_test_all$decision)
#> Set of Modes to be tested 0.01 0.05
#> 1 {1,2} 0 0
#> 2 {1,3} 0 0
#> 3 {2,3} 0 0
#> 4 {1,2,3} 0 1
cat(paste0('\nTesting "data_H0" with the algorithm to identify each mode. The estimated rank is: ', H0_test_algo$rank[1], ',', H0_test_algo$rank[2], ',', H0_test_algo$rank[3], '; the decision data frame is:\n'))
#>
#> Testing "data_H0" with the algorithm to identify each mode. The estimated rank is: 2,2,2; the decision data frame is:
print.data.frame(H0_test_algo$decision)
#> Mode to be identified 0.01 0.05
#> 1 1 0 1
#> 2 2 0 1
#> 3 3 0 1
Again, the test can be improved with more observations:
<- testKron_auto(ten=data_H0_large, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=TRUE)
H0_test_all_large <- testKron_auto(ten=data_H0_large, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=FALSE)
H0_test_algo_large cat(paste0('Testing "data_H0_large" for all possible set of modes. The estimated rank is: ', H0_test_all_large$rank[1], ',', H0_test_all_large$rank[2], ',', H0_test_all_large$rank[3], '; the decision data frame is:\n'))
#> Testing "data_H0_large" for all possible set of modes. The estimated rank is: 2,2,2; the decision data frame is:
print.data.frame(H0_test_all_large$decision)
#> Set of Modes to be tested 0.01 0.05
#> 1 {1,2} 0 0
#> 2 {1,3} 0 0
#> 3 {2,3} 0 0
#> 4 {1,2,3} 0 0
cat(paste0('\nTesting "data_H0_large" with the algorithm to identify each mode. The estimated rank is: ', H0_test_algo_large$rank[1], ',', H0_test_algo_large$rank[2], ',', H0_test_algo_large$rank[3], '; the decision data frame is:\n'))
#>
#> Testing "data_H0_large" with the algorithm to identify each mode. The estimated rank is: 2,2,2; the decision data frame is:
print.data.frame(H0_test_algo_large$decision)
#> Mode to be identified 0.01 0.05
#> 1 1 0 0
#> 2 2 0 0
#> 3 3 0 0
Lastly, we demonstrate the performance of the test if the alternative is true. Suppose an observed order-3 tensor time series \(\{\mathcal{Y}_t\}\) has no Kronecker product structure along \(\{2,3\}\), i.e., \(\{\mathcal{Y}_t\}\) does not follow a Tucker-decomposition factor model in general while the series reshaped along \(\{2,3\}\) does. In this case, we generate the reshaped data and only the data reshaped back is observed; notice that the noise is still naturally order-3 tensor-valued which is weakly dependent across time and mode fibers.
<- tensorMiss::tensor_gen(K, TT, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$X - tensorMiss::tensor_gen(K, TT, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$C #noise series
noise_H1 <- tensorMiss::tensor_gen(K-1, TT, c(d[-c(2,3)],prod(d[c(2,3)])), c(r[-c(2,3)],prod(r[c(2,3)])), c(re[-c(2,3)],prod(re[c(2,3)])), eta[1:(K-1)], coef_f, coef_fe, coef_e, seed=2025) #common component for the reshaped data
reshape_H1 <- ten_reshape_back(reshape_H1$C , c(2,3), c(TT, d)) + noise_H1 data_H1
Using ‘testKron_auto’, we present the ‘decision’ data frames and
‘rank’:
<- testKron_auto(ten=data_H1, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=TRUE)
H1_test_all <- testKron_auto(ten=data_H1, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=FALSE)
H1_test_algo cat(paste0('Testing "data_H1" for all possible set of modes. The estimated rank is: ', H1_test_all$rank[1], ',', H1_test_all$rank[2], ',', H1_test_all$rank[3], '; the decision data frame is:\n'))
#> Testing "data_H1" for all possible set of modes. The estimated rank is: 2,4,3; the decision data frame is:
print.data.frame(H1_test_all$decision)
#> Set of Modes to be tested 0.01 0.05
#> 1 {1,2} 0 0
#> 2 {1,3} 0 0
#> 3 {2,3} 1 1
#> 4 {1,2,3} 1 1
cat(paste0('\nTesting "data_H1" with the algorithm to identify each mode. The estimated rank is: ', H1_test_algo$rank[1], ',', H1_test_algo$rank[2], ',', H1_test_algo$rank[3], '; the decision data frame is:\n'))
#>
#> Testing "data_H1" with the algorithm to identify each mode. The estimated rank is: 2,4,3; the decision data frame is:
print.data.frame(H1_test_algo$decision)
#> Mode to be identified 0.01 0.05
#> 1 1 1 1
#> 2 2 1 1
#> 3 3 1 1
It worths to note the estimated ranks for modes 2 and 3, which are in fact invalid ranks due to the lost of Kronecker product structure along those modes. On the second result above, the algorithm incorrectly identifies mode-1, which is salvaged by more observations:
<- tensorMiss::tensor_gen(K, TT_large, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$X - tensorMiss::tensor_gen(K, TT_large, d, r, re, eta, coef_f, coef_fe, coef_e, seed=2025)$C #noise series
noise_H1_large <- tensorMiss::tensor_gen(K-1, TT_large, c(d[-c(2,3)],prod(d[c(2,3)])), c(r[-c(2,3)],prod(r[c(2,3)])), c(re[-c(2,3)],prod(re[c(2,3)])), eta[1:(K-1)], coef_f, coef_fe, coef_e, seed=2025) #common component for the reshaped data
reshape_H1_large <- ten_reshape_back(reshape_H1_large$C , c(2,3), c(TT_large, d)) + noise_H1_large
data_H1_large <- testKron_auto(ten=data_H1_large, r=0, alpha=c(0.01, 0.05), q=0.05, r.exact=FALSE, all=FALSE)
H1_test_algo_large cat(paste0('Testing "data_H1_large" with the algorithm to identify each mode. The estimated rank is: ', H1_test_algo_large$rank[1], ',', H1_test_algo_large$rank[2], ',', H1_test_algo_large$rank[3], '; the decision data frame is:\n'))
#> Testing "data_H1_large" with the algorithm to identify each mode. The estimated rank is: 2,1,3; the decision data frame is:
print.data.frame(H1_test_algo_large$decision)
#> Mode to be identified 0.01 0.05
#> 1 1 0 0
#> 2 2 1 1
#> 3 3 1 1