# sphunif: Uniformity Tests on the Circle, Sphere, and Hypersphere

## Just give me a quick example!

### Circular data

Suppose that you want to test if a sample of circular data is uniformly distributed. For example, the following circular uniform sample in radians:

set.seed(987202226)
cir_data <- runif(n = 30, min = 0, max = 2 * pi)

Then you can call the main function in the sphunif package, unif_test, specifying the type of test to be performed. For example, the Watson (omnibus) test:

library(sphunif)
#> Warning: package 'Rcpp' was built under R version 4.2.3
unif_test(data = cir_data, type = "Watson") # An htest object
#>
#>  Watson test of circular uniformity
#>
#> data:  cir_data
#> statistic = 0.03701, p-value = 0.8584
#> alternative hypothesis: any alternative to circular uniformity

By default, the calibration of the test statistic is done by Monte Carlo. This can be changed with p_value = "asymp" to use the asymptotic distribution:

unif_test(data = cir_data, type = "Watson", p_value = "MC") # Monte Carlo
#> Warning: package 'doFuture' was built under R version 4.2.3
#> Warning: package 'future' was built under R version 4.2.3
#>
#>  Watson test of circular uniformity
#>
#> data:  cir_data
#> statistic = 0.03701, p-value = 0.8662
#> alternative hypothesis: any alternative to circular uniformity
unif_test(data = cir_data, type = "Watson", p_value = "asymp") # Asymp. distr.
#>
#>  Watson test of circular uniformity
#>
#> data:  cir_data
#> statistic = 0.03701, p-value = 0.8584
#> alternative hypothesis: any alternative to circular uniformity

You can perform several tests within a single call to unif_test. Choose the available circular uniformity tests from

avail_cir_tests
#>  [1] "Ajne"           "Bakshaev"       "Bingham"        "Cressie"
#>  [5] "CCF09"          "FG01"           "Gine_Fn"        "Gine_Gn"
#>  [9] "Gini"           "Gini_squared"   "Greenwood"      "Hermans_Rasson"
#> [13] "Hodges_Ajne"    "Kuiper"         "Log_gaps"       "Max_uncover"
#> [17] "Num_uncover"    "PAD"            "PCvM"           "Poisson"
#> [21] "PRt"            "Pycke"          "Pycke_q"        "Range"
#> [25] "Rao"            "Rayleigh"       "Riesz"          "Rothman"
#> [29] "Sobolev"        "Softmax"        "Vacancy"        "Watson"
#> [33] "Watson_1976"

For example, you can use the Projected Anderson–Darling (García-Portugués, Navarro-Esteban, and Cuesta-Albertos (2023), also an omnibus test) test and the Watson test:

# A *list* of htest objects
unif_test(data = cir_data, type = c("PAD", "Watson"))
#> Series truncated from 10000 to 10000 terms (difference <= 0 with the HBE tail probability; last weight = 1.137e-08).
#> $PAD #> #> Projected Anderson-Darling test of circular uniformity #> #> data: cir_data #> statistic = 0.54217, p-value = 0.8403 #> alternative hypothesis: any alternative to circular uniformity #> #> #>$Watson
#>
#>  Watson test of circular uniformity
#>
#> data:  cir_data
#> statistic = 0.03701, p-value = 0.8584
#> alternative hypothesis: any alternative to circular uniformity

García-Portugués and Verdebout (2018) gives a review of different tests of uniformity on the circle. Section 5.1 in Pewsey and García-Portugués (2021) also gives an overview of recent advances.

### Spherical data

Suppose now that you want to test if a sample of spherical data is uniformly distributed. Consider a non-uniformly-generated1 sample of directions in Cartesian coordinates, such as:

# Sample data on S^2
set.seed(987202226)
theta <- runif(n = 30, min = 0, max = 2 * pi)
phi <- runif(n = 30, min = 0, max = pi)
sph_data <- cbind(cos(theta) * sin(phi), sin(theta) * sin(phi), cos(phi))

The available spherical uniformity tests:

avail_sph_tests
#>  [1] "Ajne"        "Bakshaev"    "Bingham"     "CCF09"       "CJ12"
#>  [6] "Gine_Fn"     "Gine_Gn"     "PAD"         "PCvM"        "Poisson"
#> [11] "PRt"         "Pycke"       "Sobolev"     "Softmax"     "Stereo"
#> [16] "Rayleigh"    "Rayleigh_HD" "Riesz"

See again García-Portugués and Verdebout (2018) for a review of tests of uniformity on the sphere and complementary Section 5.1 in Pewsey and García-Portugués (2021).

The default type = "all" equals type = avail_sph_tests, which might be too much for standard analysis:

unif_test(data = sph_data, type = "all", p_value = "MC", M = 1e2)
#> $Ajne #> #> Ajne test of spherical uniformity #> #> data: sph_data #> statistic = 0.057937, p-value = 1 #> alternative hypothesis: any non-axial alternative to spherical uniformity #> #> #>$Bakshaev
#>
#>  Bakshaev (2010) test of spherical uniformity
#>
#> data:  sph_data
#> statistic = 1.0215, p-value = 0.61
#> alternative hypothesis: any alternative to spherical uniformity
#>
#>
#> $Bingham #> #> Bingham test of spherical uniformity #> #> data: sph_data #> statistic = 17.573, p-value < 2.2e-16 #> alternative hypothesis: scatter matrix different from constant #> #> #>$CCF09
#>
#>  Cuesta-Albertos et al. (2009) test of spherical uniformity with k = 50
#>
#> data:  sph_data
#> statistic = 1.1355, p-value = 0.79
#> alternative hypothesis: any alternative to spherical uniformity
#>
#>
#> $CJ12 #> #> Cai and Jiang (2012) test of spherical uniformity #> #> data: sph_data #> statistic = 19.442, p-value = 0.77 #> alternative hypothesis: unclear consistency #> #> #>$Gine_Fn
#>
#>  Gine's Fn test of spherical uniformity
#>
#> data:  sph_data
#> statistic = 1.5391, p-value = 0.43
#> alternative hypothesis: any alternative to spherical uniformity
#>
#>
#> $Gine_Gn #> #> Gine's Gn test of spherical uniformity #> #> data: sph_data #> statistic = 1.3074, p-value < 2.2e-16 #> alternative hypothesis: any axial alternative to spherical uniformity #> #> #>$PAD
#>
#>  Projected Anderson-Darling test of spherical uniformity
#>
#> data:  sph_data
#> statistic = 0.91653, p-value = 0.52
#> alternative hypothesis: any alternative to spherical uniformity
#>
#>
#> $PCvM #> #> Projected Cramer-von Mises test of spherical uniformity #> #> data: sph_data #> statistic = 0.12769, p-value = 0.61 #> alternative hypothesis: any alternative to spherical uniformity #> #> #>$Poisson
#>
#>  Poisson-kernel test of spherical uniformity with rho = 0.5
#>
#> data:  sph_data
#> statistic = 0.24935, p-value = 0.13
#> alternative hypothesis: any alternative to spherical uniformity for rho > 0
#>
#>
#> $PRt #> #> Projected Rothman test of spherical uniformity with t = 0.333 #> #> data: sph_data #> statistic = 0.15523, p-value = 0.64 #> alternative hypothesis: any alternative to spherical uniformity if t is irrational #> #> #>$Pycke
#>
#>  Pycke test of spherical uniformity
#>
#> data:  sph_data
#> statistic = 0.042839, p-value = 0.3
#> alternative hypothesis: any alternative to spherical uniformity
#>
#>
#> $Sobolev #> #> Finite Sobolev test of spherical uniformity with vk2 = c(0, 0, 1) #> #> data: sph_data #> statistic = 4.3066, p-value = 0.72 #> alternative hypothesis: alternatives in the spherical harmonics subspace with vk2 ≠ 0 #> #> #>$Softmax
#>
#>  Softmax test of spherical uniformit with kappa = 1
#>
#> data:  sph_data
#> statistic = -0.065236, p-value = 0.53
#> alternative hypothesis: any alternative to spherical uniformity for kappa > 0
#>
#>
#> $Stereo #> #> Stereographic projection test of spherical uniformity with a = 0 #> #> data: sph_data #> statistic = 3.2993, p-value = 0.17 #> alternative hypothesis: any alternative to spherical uniformity for |a| < 1 #> #> #>$Rayleigh
#>
#>  Rayleigh test of spherical uniformity
#>
#> data:  sph_data
#> statistic = 0.13345, p-value = 0.99
#> alternative hypothesis: mean direction different from zero
#>
#>
#> $Rayleigh_HD #> #> HD-standardized Rayleigh test of spherical uniformity #> #> data: sph_data #> statistic = -1.1703, p-value = 0.99 #> alternative hypothesis: mean direction different from zero #> #> #>$Riesz
#>
#>  Warning! This is an experimental test not meant to be used with s = 1
#>
#> data:  sph_data
#> statistic = 1.0215, p-value = 0.61
#> alternative hypothesis: unclear, experimental test
unif_test(data = sph_data, type = "Rayleigh", p_value = "asymp")
#>
#>  Rayleigh test of spherical uniformity
#>
#> data:  sph_data
#> statistic = 0.13345, p-value = 0.9875
#> alternative hypothesis: mean direction different from zero

### Higher dimensions

The hyperspherical setting is treated analogously to the spherical setting, and the available tests are exactly the same (avail_sph_tests). Here is an example of testing uniformity with a sample of size 100 on the $$9$$-sphere:

# Sample data on S^9
hyp_data <- r_unif_sph(n = 30, p = 10)

# Test
unif_test(data = hyp_data, type = "Rayleigh", p_value = "asymp")
#>
#>  Rayleigh test of spherical uniformity
#>
#> data:  hyp_data
#> statistic = 9.241, p-value = 0.5094
#> alternative hypothesis: mean direction different from zero

## A data example: are Venusian craters uniformly distributed?

The following snippet partially reproduces the data application in García-Portugués, Navarro-Esteban, and Cuesta-Albertos (2021) on testing the uniformity of known Venusian craters. Incidentally, it also illustrates how to monitor the progress of a Monte Carlo simulation when p_value = "MC" using progressr.

# Load spherical data
data(venus)
#>        name diameter     theta      phi
#> 1    Janice     10.0 4.5724136 1.523672
#> 2  HuaMulan     24.0 5.8939769 1.514946
#> 3   Tatyana     19.0 3.7070793 1.490511
#> 4 Landowska     33.0 1.2967796 1.476025
#> 5 Ruslanova     44.3 0.2897247 1.465029
#> 6     Sveta     21.0 4.7684140 1.439181
nrow(venus)
#> [1] 967

# Compute Cartesian coordinates from polar coordinates
venus$X <- cbind(cos(venus$theta) * cos(venus$phi), sin(venus$theta) * cos(venus$phi), sin(venus$phi))

# Test uniformity using the Projected Cramér-von Mises and Projected
# Anderson-Darling tests on S^2, both using asymptotic distributions
unif_test(data = venus$X, type = c("PCvM", "PAD"), p_value = "asymp") #> Series truncated from 10000 to 10000 terms (difference <= 0 with the HBE tail probability; last weight = 6.249e-14). #> Series truncated from 10000 to 10000 terms (difference <= 0 with the HBE tail probability; last weight = 4.999e-13). #>$PCvM
#>
#>  Projected Cramer-von Mises test of spherical uniformity
#>
#> data:  venus$X #> statistic = 0.25844, p-value = 0.1272 #> alternative hypothesis: any alternative to spherical uniformity #> #> #>$PAD
#>
#>  Projected Anderson-Darling test of spherical uniformity
#>
#> data:  venus$X #> statistic = 1.5077, p-value = 0.1149 #> alternative hypothesis: any alternative to spherical uniformity # Define a handler for progressr require(progress) #> Loading required package: progress #> Warning: package 'progress' was built under R version 4.2.3 require(progressr) #> Loading required package: progressr handlers(handler_progress( format = paste("(:spin) [:bar] :percent Iter: :current/:total Rate:", ":tick_rate iter/sec ETA: :eta Elapsed: :elapsedfull"), clear = FALSE)) # Test uniformity using Monte-Carlo approximated null distributions with_progress( unif_test(data = venus$X, type = c("PCvM", "PAD"),
p_value = "MC", chunks = 1e2, M = 5e2, cores = 2)
)
#> $PCvM #> #> Projected Cramer-von Mises test of spherical uniformity #> #> data: venus$X
#> statistic = 0.25844, p-value = 0.116
#> alternative hypothesis: any alternative to spherical uniformity
#>
#>
#> $PAD #> #> Projected Anderson-Darling test of spherical uniformity #> #> data: venus$X
#> statistic = 1.5077, p-value = 0.104
#> alternative hypothesis: any alternative to spherical uniformity

## Simulation studies done simple

unif_stat allows to compute several statistics to different samples within a single call, thus facilitating Monte Carlo experiments:

# M samples of size n on S^2
samps_sph <- r_unif_sph(n = 30, p = 3, M = 10)

# Apply all statistics to the M samples
unif_stat(data = samps_sph, type = "all")
#>          Ajne  Bakshaev   Bingham     CCF09     CJ12   Gine_Fn   Gine_Gn
#> 1  0.25642629 1.2740243 3.0921699 1.3219666 18.36313 1.3878146 0.3621095
#> 2  0.07117203 0.6772576 6.9893732 0.9149047 24.98233 0.9265608 0.6418727
#> 3  0.22589028 1.1723079 3.4333269 1.1726659 22.55802 1.3162019 0.4126408
#> 4  0.19451359 0.9883296 2.3956280 1.2898928 26.17551 1.0994251 0.3213708
#> 5  0.34557032 1.7121433 3.4357302 1.4104864 23.14723 1.8216276 0.4393463
#> 6  0.52106149 2.5014190 6.5814801 1.5830810 24.59617 2.6339081 0.5496621
#> 7  0.37727819 1.8175025 3.7137802 1.4601007 21.03363 1.9357469 0.4266342
#> 8  0.17723191 0.8598836 0.1801498 1.2168187 22.37936 0.9713328 0.2624051
#> 9  0.26937836 1.3273645 3.5939142 1.5021719 21.16927 1.4685129 0.3909994
#> 10 0.20070040 1.2037837 6.2135243 1.3720805 24.17573 1.4248372 0.6220356
#>          PAD       PCvM     Poisson       PRt        Pycke   Sobolev
#> 1  0.9395839 0.15925304 -0.09593059 0.2167113 -0.022374854  4.055000
#> 2  0.5945587 0.08465719 -0.20176173 0.1096982 -0.106597617  2.585482
#> 3  0.8936470 0.14653849 -0.06847430 0.1945375 -0.017132971  7.166466
#> 4  0.7531209 0.12354120 -0.21551233 0.1638546 -0.082808626  5.534846
#> 5  1.2281411 0.21401791  0.07538763 0.3046020  0.072236131  3.816622
#> 6  1.7442098 0.31267737  0.41376804 0.4279620  0.177738562 10.600319
#> 7  1.3082009 0.22718781  0.16803689 0.3062035  0.104767601  9.554070
#> 8  0.7010165 0.10748545 -0.10427448 0.1337058 -0.056249730 13.999276
#> 9  0.9970355 0.16592056  0.01155793 0.2155874  0.001478758 11.847208
#> 10 0.9429167 0.15047296  0.02887818 0.2031502  0.025172789  4.733793
#>        Softmax      Stereo  Rayleigh  Rayleigh_HD     Riesz
#> 1  -0.04452033  0.97559309 3.1198646  0.048934523 1.2740243
#> 2  -0.31080002 -3.43368909 0.4292885 -1.049488573 0.6772576
#> 3  -0.10493392 -0.01949183 2.5215007 -0.195346508 1.1723079
#> 4  -0.18903811 -3.10566918 2.1572087 -0.344068123 0.9883296
#> 5   0.16466927  2.06393929 4.5929816  0.650331995 1.7121433
#> 6   0.61467268  1.59056142 7.1476194  1.693258549 2.5014190
#> 7   0.21878763  4.31581041 4.7983718  0.734182229 1.8175025
#> 8  -0.31301945 -1.07841571 1.4134453 -0.647708254 0.8598836
#> 9  -0.01942868 -0.85602632 3.0045142  0.001842922 1.3273645
#> 10 -0.08055493  1.98899355 2.2218008 -0.317698486 1.2037837

Additionally, unif_stat_MC is an utility for performing the previous simulation through a convenient parallel wrapper:

# Break the simulation in 10 chunks of tasks to be divided between 2 cores
sim <- unif_stat_MC(n = 30, type = "all", p = 3, M = 1e2, cores = 2,
chunks = 10)

# Critical values for test statistics
sim$crit_val_MC #> Ajne Bakshaev Bingham CCF09 CJ12 Gine_Fn Gine_Gn PAD #> 0.1 0.3985377 1.900965 9.641849 1.607723 25.76571 2.071965 0.7824702 1.370771 #> 0.05 0.4293123 2.119489 10.883988 1.673388 26.04390 2.208151 0.9104627 1.439705 #> 0.01 0.4862980 2.367336 12.929769 1.768780 27.71544 2.509586 0.9826411 1.663297 #> PCvM Poisson PRt Pycke Sobolev Softmax Stereo #> 0.1 0.2376206 0.2275968 0.3364439 0.09537109 10.79428 0.3209408 4.763500 #> 0.05 0.2649362 0.3387827 0.3719113 0.12780765 12.55933 0.3995652 6.189260 #> 0.01 0.2959170 0.4855194 0.4103732 0.21006212 17.63509 0.5294922 9.170729 #> Rayleigh Rayleigh_HD Riesz #> 0.1 5.348216 0.9586553 1.900965 #> 0.05 5.977951 1.2157434 2.119489 #> 0.01 6.686576 1.5050384 2.367336 # Test statistics head(sim$stats_MC)
#>         Ajne  Bakshaev   Bingham    CCF09     CJ12   Gine_Fn   Gine_Gn
#> 1 0.21694040 1.0775492  2.453843 1.259021 25.66338 1.1994472 0.3316856
#> 2 0.12807204 0.7713214  4.378542 1.050875 22.82851 0.9488295 0.4365413
#> 3 0.29297019 1.3318434  1.809964 1.606800 23.17901 1.4254091 0.2535283
#> 4 0.08683535 0.7481899  6.545450 1.100101 16.60092 1.0029567 0.6556153
#> 5 0.23053105 1.5359203 12.536621 1.250269 15.47708 1.9035017 0.9813775
#> 6 0.22807235 1.1421569  3.120667 1.267351 25.84960 1.2559271 0.3436377
#>         PAD       PCvM     Poisson       PRt       Pycke   Sobolev     Softmax
#> 1 0.8240941 0.13469365 -0.11669766 0.1761758 -0.05307337  9.007942 -0.15359967
#> 2 0.6362508 0.09641517 -0.22109092 0.1188851 -0.10675328  7.122146 -0.28051417
#> 3 0.9845993 0.16648043 -0.02275223 0.2125231 -0.01783641 15.989324 -0.02079736
#> 4 0.6549005 0.09352374 -0.09784943 0.1193201 -0.07731601  6.343550 -0.29460718
#> 5 1.2113886 0.19199003  0.30421787 0.2476503  0.14422646  6.006292  0.13450569
#> 6 0.8495502 0.14276961 -0.16115142 0.1881344 -0.07465451  6.871270 -0.09297427
#>      Stereo  Rayleigh Rayleigh_HD     Riesz
#> 1 -2.187542 2.3063524 -0.28318046 1.0775492
#> 2 -3.742812 1.0530639 -0.79483333 0.7713214
#> 3 -1.257477 3.2368961  0.09671245 1.3318434
#> 4 -2.709981 0.5177747 -1.01336422 0.7481899
#> 5  9.773694 2.5545518 -0.18185347 1.5359203
#> 6 -4.153019 2.6935451 -0.12510970 1.1421569

# Power computation using a pre-built sampler for the alternative
pow <- unif_stat_MC(n = 30, type = "all", p = 3, M = 1e2, cores = 2,
chunks = 10, r_H1 = r_alt, crit_val = sim$crit_val_MC, alt = "vMF", kappa = 1) pow$power_MC
#>      Ajne Bakshaev Bingham CCF09 CJ12 Gine_Fn Gine_Gn  PAD PCvM Poisson  PRt
#> 0.1  0.90     0.91    0.12  0.82 0.11    0.85    0.14 0.89 0.91    0.86 0.89
#> 0.05 0.84     0.85    0.09  0.78 0.10    0.84    0.08 0.85 0.85    0.77 0.85
#> 0.01 0.77     0.77    0.03  0.66 0.01    0.78    0.03 0.77 0.77    0.63 0.80
#>      Pycke Sobolev Softmax Stereo Rayleigh Rayleigh_HD Riesz
#> 0.1   0.89    0.11    0.87   0.58     0.90        0.90  0.91
#> 0.05  0.84    0.08    0.85   0.48     0.83        0.83  0.85
#> 0.01  0.67    0.02    0.77   0.30     0.78        0.78  0.77

# How to use a custom sampler according to ?unif_stat_MC
r_H1 <- function(n, p, M, l = 1) {

samp <- array(dim = c(n, p, M))
for (j in 1:M) {

samp[, , j] <- mvtnorm::rmvnorm(n = n, mean = c(l, rep(0, p - 1)),
sigma = diag(rep(1, p)))
samp[, , j] <- samp[, , j] / sqrt(rowSums(samp[, , j]^2))

}
return(samp)

}
pow <- unif_stat_MC(n = 30, type = "all", p = 3, M = 1e2, cores = 2,
chunks = 5, r_H1 = r_H1, crit_val = sim$crit_val_MC) pow$power_MC
#>      Ajne Bakshaev Bingham CCF09 CJ12 Gine_Fn Gine_Gn PAD PCvM Poisson PRt
#> 0.1     1        1    0.36     1 0.09       1    0.37   1    1       1   1
#> 0.05    1        1    0.29     1 0.08       1    0.24   1    1       1   1
#> 0.01    1        1    0.15     1 0.01       1    0.21   1    1       1   1
#>      Pycke Sobolev Softmax Stereo Rayleigh Rayleigh_HD Riesz
#> 0.1      1    0.21       1   0.97        1           1     1
#> 0.05     1    0.13       1   0.94        1           1     1
#> 0.01     1    0.03       1   0.82        1           1     1

unif_stat_MC can be used for constructing the Monte Carlo calibration necessary for unif_test, either for providing a rejection rule based on $crit_val_MC or for approximating the $$p$$-value by $stats_MC.

# Using precomputed critical values
ht1 <- unif_test(data = samps_sph[, , 1], type = "Rayleigh",
p_value = "crit_val", crit_val = sim$crit_val_MC) ht1 #> #> Rayleigh test of spherical uniformity #> #> data: samps_sph[, , 1] #> statistic = 3.1199, p-value = NA #> alternative hypothesis: mean direction different from zero ht1$reject
#>  0.1 0.05 0.01
#> TRUE TRUE TRUE

# Using precomputed Monte Carlo statistics
ht2 <- unif_test(data = samps_sph[, , 1], type = "Rayleigh",
p_value = "MC", stats_MC = sim$stats_MC) ht2 #> #> Rayleigh test of spherical uniformity #> #> data: samps_sph[, , 1] #> statistic = 3.1199, p-value = 0.98 #> alternative hypothesis: mean direction different from zero ht2$reject
#>  0.1 0.05 0.01
#> TRUE TRUE TRUE

# Faster than
unif_test(data = samps_sph[, , 1], type = "Rayleigh", p_value = "MC")
#>
#>  Rayleigh test of spherical uniformity
#>
#> data:  samps_sph[, , 1]
#> statistic = 3.1199, p-value = 0.3739
#> alternative hypothesis: mean direction different from zero

## References

García-Portugués, E., P. Navarro-Esteban, and J. A. Cuesta-Albertos. 2021. “A Cramér–von Mises Test of Uniformity on the Hypersphere.” In Statistical Learning and Modeling in Data Analysis, edited by S. Balzano, G. C. Porzio, R. Salvatore, D. Vistocco, and M. Vichi, 107–16. Studies in Classification, Data Analysis and Knowledge Organization. Cham: Springer. https://doi.org/10.1007/978-3-030-69944-4_12.
———. 2023. “On a Projection-Based Class of Uniformity Tests on the Hypersphere.” Bernoulli 29 (1): 181–204. https://doi.org/10.3150/21-BEJ1454.
García-Portugués, E., and T. Verdebout. 2018. “A Review of Uniformity Tests on the Hypersphere.” arXiv:1804.00286. https://arxiv.org/abs/1804.00286.
Pewsey, A., and E. García-Portugués. 2021. “Recent Advances in Directional Statistics.” Test 30 (1): 1–58. https://doi.org/10.1007/s11749-021-00759-x.

1. Uniformly-distributed polar coordinates do not translate into uniformly-distributed Cartesian coordinates!↩︎