Things that can go wrong

Carl James Schwarz

2024-01-25

1 Things that can go wrong.

There are several types of problem that can occur that make the use of SPAS problematic.

1.1 Rows are proportional

If the row in the movement matrix are proportional, then there an infinite number of solutions. Note it is the singularity of the movement matrix that is problematic; the number of tags releases and not recovered (last column of the input data) can be proportional without affecting the fit.

The model may converge to different solutions depending on your set up. There is no automatic “flagging” of this situation and you need to be vigilant. A diagnostic tool is the condition number of \(XX'\) where \(X\) is the movement matrix. If the rows are exactly proportional (or more generally if the rows are colinear) the condition number will be \(+\infty\).

The solution is to pool the row(s) that are proportional.

Here are some examples:


test.data.csv <- textConnection("
 160   ,   120   ,     72   ,     82   ,   3592
  80   ,    60   ,     36   ,     41   ,    532
7960   ,  9720   ,   6264   ,   7934   ,   0  ")

test.data <- as.matrix(read.csv(test.data.csv, header=FALSE, strip.white=TRUE))
test.data
#>        V1   V2   V3   V4   V5
#> [1,]  160  120   72   82 3592
#> [2,]   80   60   36   41  532
#> [3,] 7960 9720 6264 7934    0

mod..1 <- SPAS.fit.model(test.data,
                       model.id="No restrictions",
                       row.pool.in=1:2, col.pool.in=1:4)
#> Using nlminb to find conditional MLE
#> outer mgc:  21233.77 
#> outer mgc:  31337.35 
#> outer mgc:  29620.78 
#> outer mgc:  7736.026 
#> outer mgc:  1637.496 
#> outer mgc:  77.12751 
#> outer mgc:  4.863 
#> outer mgc:  0.1203054 
#> outer mgc:  0.04505754 
#> outer mgc:  0.01671065 
#> Convergence codes from nlminb  0 relative convergence (4) 
#> Finding conditional estimate of N

SPAS.print.model(mod..1)
#> Model Name: No restrictions 
#>    Date of Fit: 2024-01-25 12:19 
#>    Version of OPEN SPAS used : SPAS-R 2023-03-31 
#>  
#> Raw data 
#>        V1   V2   V3   V4   V5
#> [1,]  160  120   72   82 3592
#> [2,]   80   60   36   41  532
#> [3,] 7960 9720 6264 7934    0
#> 
#> Row pooling setup : 1 2 
#> Col pooling setup : 1 2 3 4 
#> Physical pooling  : TRUE 
#> Theta pooling     : FALSE 
#> CJS pooling       : FALSE 
#> 
#> 
#> Chapman estimator of population size  238286  (SE  8578  )
#>  
#> 
#> Raw data AFTER PHYSICAL (but not logical) POOLING 
#>       pool1 pool2 pool3 pool4   V5
#> pool1   160   120    72    82 3592
#> pool2    80    60    36    41  532
#>        7960  9720  6264  7934    0
#> 
#> Condition number of XX' where X= (physically) pooled matrix is  Inf 
#> Condition number of XX' after logical pooling                   Inf 
#> 
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#> 
#>   Conditional   Log-Likelihood: 285758.5    ;  np: 12 ;  AICc: -571493 
#> 
#>   Code/Message from optimization is:  0 relative convergence (4) 
#> 
#> Estimates
#>               pool1  pool2  pool3  pool4  psi cap.prob exp factor Pop Est
#> pool1         109.1  132.2   85.1  107.7 3592    0.013       73.5  299742
#> pool2          80.0   60.0   36.0   41.0  532    1.000        0.0     749
#> est unmarked 8011.0 9708.0 6251.0 7908.0    0       NA         NA  300491
#> 
#> SE of above estimates
#>              pool1 pool2 pool3 pool4  psi cap.prob exp factor Pop Est
#> pool1          5.3   6.4   4.2   5.3 59.9    0.001        3.5   14291
#> pool2          8.9   7.7   6.0   6.4 23.1    0.000        0.0       0
#> est unmarked    NA    NA    NA    NA  0.0       NA         NA   13499
#> 
#> 
#> Chisquare gof cutoff  : 0.1 
#> Chisquare gof value   : 33.49443 
#> Chisquare gof df      : 2 
#> Chisquare gof p       : 5.330621e-08

In this case, it appears that the model has converged, but this is ONE of many possible solutions and cannot be relied upon.

The condition number is very large (!) as expected and shown in the above output:

# Compute the condition number of XX'

XX <- test.data[1:2, 1:4] %*% t(test.data[1:2, 1:4])
XX
#>       [,1]  [,2]
#> [1,] 51908 25954
#> [2,] 25954 12977
cat("\n\nCondition number is\n")
#> 
#> 
#> Condition number is
kappa(XX)
#> [1] Inf

We will either physically or logically pool rows:

mod..2 <- SPAS.fit.model(test.data,
                       model.id="No restrictions",
                       row.pool.in=c(1,1), col.pool.in=1:4, 
                       row.physical.pool=FALSE)
#> Using nlminb to find conditional MLE
#> outer mgc:  31861.11 
#> outer mgc:  31027.39 
#> outer mgc:  27743.45 
#> outer mgc:  64891.49 
#> outer mgc:  17714.73 
#> outer mgc:  2938.93 
#> outer mgc:  388.7135 
#> outer mgc:  1367.762 
#> outer mgc:  58.7506 
#> outer mgc:  33.22109 
#> outer mgc:  0.2950866 
#> outer mgc:  0.001589974 
#> Convergence codes from nlminb  0 relative convergence (4) 
#> Finding conditional estimate of N

SPAS.print.model(mod..2)
#> Model Name: No restrictions 
#>    Date of Fit: 2024-01-25 12:19 
#>    Version of OPEN SPAS used : SPAS-R 2023-03-31 
#>  
#> Raw data 
#>        V1   V2   V3   V4   V5
#> [1,]  160  120   72   82 3592
#> [2,]   80   60   36   41  532
#> [3,] 7960 9720 6264 7934    0
#> 
#> Row pooling setup : 1 1 
#> Col pooling setup : 1 2 3 4 
#> Physical pooling  : FALSE 
#> Theta pooling     : FALSE 
#> CJS pooling       : FALSE 
#> 
#> 
#> Chapman estimator of population size  238286  (SE  8578  )
#>  
#> 
#> Raw data AFTER PHYSICAL (but not logical) POOLING 
#>        pool1 pool2 pool3 pool4   V5
#> pool.1   160   120    72    82 3592
#> pool.1    80    60    36    41  532
#>         7960  9720  6264  7934    0
#> 
#> Condition number of XX' where X= (physically) pooled matrix is  Inf 
#> Condition number of XX' after logical pooling                   1 
#> 
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#> 
#>   Conditional   Log-Likelihood: 285750.9    ;  np: 11 ;  AICc: -571479.7 
#> 
#>   Code/Message from optimization is:  0 relative convergence (4) 
#> 
#> Estimates
#>               pool1  pool2  pool3  pool4  psi cap.prob exp factor Pop Est
#> pool.1        109.4  132.1   85.0  107.5 3592     0.02         49  201170
#> pool.1         54.7   66.0   42.5   53.7  532     0.02         49   37426
#> est unmarked 8036.0 9702.0 6244.0 7896.0    0       NA         NA  238596
#> 
#> SE of above estimates
#>              pool1 pool2 pool3 pool4  psi cap.prob exp factor Pop Est
#> pool.1         6.7   8.7   6.7   8.1 59.9    0.001        1.9    7805
#> pool.1         5.5   7.4   6.0   7.2 23.1    0.001        1.9    1452
#> est unmarked    NA    NA    NA    NA  0.0       NA         NA    8603
#> 
#> 
#> Chisquare gof cutoff  : 0.1 
#> Chisquare gof value   : 49.81373 
#> Chisquare gof df      : 3 
#> Chisquare gof p       : 8.753258e-11

This has solved the colinearity problem.

1.2 Rows are approximately proportional

Of course with real data, it is highly unlikely that the rows in the recovery matrix will be exactly proportional.

The model may converge to different solutions depending on your set up. There is no automatic “flagging” of this situation and you need to be vigilant. The condition number of \(XX'\) may be useful.

The solution is to pool the row(s) that are proportional.

Here are some examples where the entries in row 2 are modified slightly to make the rows only approximately proportional


test.data.csv <- textConnection("
 160   ,   120   ,     72   ,     82   ,   3592
  75   ,    62   ,     38   ,     35   ,    532
7960   ,  9720   ,   6264   ,   7934   ,   0  ")

test.data <- as.matrix(read.csv(test.data.csv, header=FALSE, strip.white=TRUE))
test.data
#>        V1   V2   V3   V4   V5
#> [1,]  160  120   72   82 3592
#> [2,]   75   62   38   35  532
#> [3,] 7960 9720 6264 7934    0

mod..2 <- SPAS.fit.model(test.data,
                       model.id="No restrictions",
                       row.pool.in=1:2, col.pool.in=1:4)
#> Using nlminb to find conditional MLE
#> outer mgc:  21468.58 
#> outer mgc:  31261.98 
#> outer mgc:  29306.81 
#> outer mgc:  8081.517 
#> outer mgc:  1632.478 
#> outer mgc:  75.28523 
#> outer mgc:  4.412961 
#> outer mgc:  0.1427219 
#> outer mgc:  0.05345935 
#> outer mgc:  0.01986673 
#> Convergence codes from nlminb  0 relative convergence (4) 
#> Finding conditional estimate of N

SPAS.print.model(mod..2)
#> Model Name: No restrictions 
#>    Date of Fit: 2024-01-25 12:19 
#>    Version of OPEN SPAS used : SPAS-R 2023-03-31 
#>  
#> Raw data 
#>        V1   V2   V3   V4   V5
#> [1,]  160  120   72   82 3592
#> [2,]   75   62   38   35  532
#> [3,] 7960 9720 6264 7934    0
#> 
#> Row pooling setup : 1 2 
#> Col pooling setup : 1 2 3 4 
#> Physical pooling  : TRUE 
#> Theta pooling     : FALSE 
#> CJS pooling       : FALSE 
#> 
#> 
#> Chapman estimator of population size  240468  (SE  8710  )
#>  
#> 
#> Raw data AFTER PHYSICAL (but not logical) POOLING 
#>       pool1 pool2 pool3 pool4   V5
#> pool1   160   120    72    82 3592
#> pool2    75    62    38    35  532
#>        7960  9720  6264  7934    0
#> 
#> Condition number of XX' where X= (physically) pooled matrix is  1785.545 
#> Condition number of XX' after logical pooling                   1785.545 
#> 
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#> 
#>   Conditional   Log-Likelihood: 285730.4    ;  np: 12 ;  AICc: -571436.7 
#> 
#>   Code/Message from optimization is:  0 relative convergence (4) 
#> 
#> Estimates
#>               pool1  pool2  pool3  pool4  psi cap.prob exp factor Pop Est
#> pool1         109.1  132.2   85.1  107.7 3592    0.013       73.5  299742
#> pool2          75.0   62.0   38.0   35.0  532    1.000        0.0     742
#> est unmarked 8011.0 9708.0 6251.0 7908.0    0       NA         NA  300484
#> 
#> SE of above estimates
#>              pool1 pool2 pool3 pool4  psi cap.prob exp factor Pop Est
#> pool1          5.3   6.4   4.2   5.3 59.9    0.001        3.5   14291
#> pool2          8.7   7.9   6.2   5.9 23.1    0.000        0.0       0
#> est unmarked    NA    NA    NA    NA  0.0       NA         NA   13499
#> 
#> 
#> Chisquare gof cutoff  : 0.1 
#> Chisquare gof value   : 33.49443 
#> Chisquare gof df      : 2 
#> Chisquare gof p       : 5.33061e-08

# Compute the condition number of XX'

XX <- test.data[1:2, 1:4] %*% t(test.data[1:2, 1:4])
XX
#>       [,1]  [,2]
#> [1,] 51908 25046
#> [2,] 25046 12138
cat("\n\nCondition number is\n")
#> 
#> 
#> Condition number is
kappa(XX)
#> [1] 1785.545

Now there is only one solution, but the estimate is very sensitive to small changes in the data. Notice that the condition number is still very large.

2 Columns that are all zero

In theory this should have no influence on the fit (see the section on pooling columns).


test.data.csv <- textConnection("
 160   ,   120   ,     72   ,     82   ,   0, 3592
 100   ,    45   ,     39   ,     90   ,   0,  532
7960   ,  9720   ,   6264   ,   7934   ,   0,    0  ")

test.data <- as.matrix(read.csv(test.data.csv, header=FALSE, strip.white=TRUE))
test.data
#>        V1   V2   V3   V4 V5   V6
#> [1,]  160  120   72   82  0 3592
#> [2,]  100   45   39   90  0  532
#> [3,] 7960 9720 6264 7934  0    0

mod..3 <- SPAS.fit.model(test.data,
                       model.id="No restrictions",
                       row.pool.in=1:2, col.pool.in=1:5)
#> Using nlminb to find conditional MLE
#> outer mgc:  19793.78 
#> outer mgc:  31172.76 
#> outer mgc:  29040.42 
#> outer mgc:  25712.46 
#> outer mgc:  5362.637 
#> outer mgc:  496.3737 
#> outer mgc:  8.392982 
#> outer mgc:  0.2236606 
#> outer mgc:  0.1492354 
#> outer mgc:  0.05639175 
#> outer mgc:  0.02111307 
#> outer mgc:  0.007818906 
#> outer mgc:  0.002723444 
#> outer mgc:  0.0009406725 
#> Convergence codes from nlminb  0 relative convergence (4) 
#> Finding conditional estimate of N

SPAS.print.model(mod..3)
#> Model Name: No restrictions 
#>    Date of Fit: 2024-01-25 12:19 
#>    Version of OPEN SPAS used : SPAS-R 2023-03-31 
#>  
#> Raw data 
#>        V1   V2   V3   V4 V5   V6
#> [1,]  160  120   72   82  0 3592
#> [2,]  100   45   39   90  0  532
#> [3,] 7960 9720 6264 7934  0    0
#> 
#> Row pooling setup : 1 2 
#> Col pooling setup : 1 2 3 4 5 
#> Physical pooling  : TRUE 
#> Theta pooling     : FALSE 
#> CJS pooling       : FALSE 
#> 
#> 
#> Chapman estimator of population size  222133  (SE  7617  )
#>  
#> 
#> Raw data AFTER PHYSICAL (but not logical) POOLING 
#>       pool1 pool2 pool3 pool4 pool5   V6
#> pool1   160   120    72    82     0 3592
#> pool2   100    45    39    90     0  532
#>        7960  9720  6264  7934     0    0
#> 
#> Condition number of XX' where X= (physically) pooled matrix is  46.8607 
#> Condition number of XX' after logical pooling                   46.8607 
#> 
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#> 
#>   Conditional   Log-Likelihood: 286003.7    ;  np: 14 ;  AICc: -571979.3 
#> 
#>   Code/Message from optimization is:  0 relative convergence (4) 
#> 
#> Estimates
#>               pool1  pool2  pool3  pool4 pool5  psi cap.prob exp factor Pop Est
#> pool1         109.1  132.2   85.1  107.7     0 3592    0.013       73.5  299742
#> pool2         100.0   45.0   39.0   90.0     0  532    1.000        0.0     806
#> est unmarked 8011.0 9708.0 6251.0 7908.0     0    0       NA         NA  300548
#> 
#> SE of above estimates
#>              pool1 pool2 pool3 pool4 pool5  psi cap.prob exp factor Pop Est
#> pool1          5.3   6.4   4.2   5.3     0 59.9    0.001        3.5   14291
#> pool2         10.0   6.7   6.2   9.5     0 23.1    0.000        0.0       0
#> est unmarked    NA    NA    NA    NA    NA  0.0       NA         NA   13499
#> 
#> 
#> Chisquare gof cutoff  : 0.1 
#> Chisquare gof value   : 33.49436 
#> Chisquare gof df      : 3 
#> Chisquare gof p       : 2.533076e-07

XX <- test.data[1:2, 1:5] %*% t(test.data[1:2, 1:5])
XX
#>       [,1]  [,2]
#> [1,] 51908 31588
#> [2,] 31588 21646
cat("\n\nCondition number is\n")
#> 
#> 
#> Condition number is
kappa(XX)
#> [1] 46.8607

Notice that the column of 0’s does not affect the fit and has no impact on the condition number of \(XX'\).

3 References

Darroch, J. N. (1961). The two-sample capture-recapture census when tagging and sampling are stratified. Biometrika, 48, 241–260. https://www.jstor.org/stable/2332748

Plante, N., L.-P Rivest, and G. Tremblay. (1988). Stratified Capture-Recapture Estimation of the Size of a Closed Population. Biometrics 54, 47-60. https://www.jstor.org/stable/2533994

Schwarz, C. J., & Taylor, C. G. (1998). The use of the stratified-Petersen estimator in fisheries management with an illustration of estimating the number of pink salmon (Oncorhynchus gorbuscha) that return to spawn in the Fraser River. Canadian Journal of Fisheries and Aquatic Sciences, 55, 281–296. https://doi.org/10.1139/f97-238