[R] Large determinant problem

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun Dec 9 08:56:32 CET 2007


On Sun, 9 Dec 2007, maj at stats.waikato.ac.nz wrote:

> I tried crossprod(S) but the results were identical. The term
> -0.5*log(det(S)) is  a complexity penalty meant to make it unattractive to
> include too many components in a finite mixture model. This case was for a
> 9-component mixture. At least up to 6 components the determinant behaved
> as expected and increased with the number of components.

And the singular values were?

I am not surprised at this: if you have too many components some of them 
may not be contributing to the fit or duplicating others: both lead to 
numerically singular information matrices.  In many mixture-fitting 
problems the log-likelihood is unbounded but with many local maxima: it is 
very easy to find a poor one.

>
> Thanks for your comments.
>
>> Hmm, S'S is numerically singular.  crossprod(S) would be a better way to
>> compute it than crossprod(S,S) (it does use a different algorithm), but
>> look at the singular values of S, which I suspect will show that S is
>> numerically singular.
>>
>> Looks like the answer is 0.
>>
>>
>> On Sun, 9 Dec 2007, maj at stats.waikato.ac.nz wrote:
>>
>>> I thought I would have another try at explaining my problem. I think
>>> that
>>> last time I may have buried it in irrelevant detail.
>>>
>>> This output should explain my dilemma:
>>>
>>>> dim(S)
>>> [1] 1455  269
>>>> summary(as.vector(S))
>>>      Min.    1st Qu.     Median       Mean    3rd Qu.       Max.
>>> -1.160e+04  0.000e+00  0.000e+00 -4.132e-08  0.000e+00  8.636e+03
>>>> sum(as.vector(S)==0)/(1455*269)
>>> [1] 0.8451794
>>> # S is a large moderately sparse matrix with some large elements
>>>> SS <- crossprod(S,S)
>>>> (eigen(SS,only.values = TRUE)$values)[250:269]
>>> [1]  9.264883e+04  5.819672e+04  5.695073e+04  1.948626e+04
>>> 1.500891e+04
>>> [6]  1.177034e+04  9.696327e+03  8.037049e+03  7.134058e+03
>>> 1.316449e-07
>>> [11]  9.077244e-08  6.417276e-08  5.046411e-08  1.998775e-08
>>> -1.268081e-09
>>> [16] -3.140881e-08 -4.478184e-08 -5.370730e-08 -8.507492e-08
>>> -9.496699e-08
>>> # S'S fails to be non-negative definite.
>>>
>>> I can't show you how to produce S easily but below I attempt at a
>>> reproducible version of the problem:
>>>
>>>> set.seed(091207)
>>>> X <- runif(1455*269,-1e4,1e4)
>>>> p <- rbinom(1455*269,1,0.845)
>>>> Y <- p*X
>>>> dim(Y) <- c(1455,269)
>>>> YY <- crossprod(Y,Y)
>>>> (eigen(YY,only.values = TRUE)$values)[250:269]
>>> [1] 17951634238 17928076223 17725528630 17647734206 17218470634
>>> 16947982383
>>> [7] 16728385887 16569501198 16498812174 16211312750 16127786747
>>> 16006841514
>>> [13] 15641955527 15472400630 15433931889 15083894866 14794357643
>>> 14586969350
>>> [19] 14297854542 13986819627
>>> # No sign of negative eigenvalues; phenomenon must be due
>>> # to special structure of S.
>>> # S is a matrix of empirical parameter scores at an approximate
>>> # mle for a model with 269 paramters fitted to 1455 observations.
>>> # Thus, for example, its column sums are approximately zero:
>>>> summary(apply(S,2,sum))
>>>      Min.    1st Qu.     Median       Mean    3rd Qu.       Max.
>>> -1.148e-03 -2.227e-04 -7.496e-06 -6.011e-05  7.967e-05  8.254e-04
>>>
>>> I'm starting to think that it may not be a good idea to attempt to
>>> compute
>>> large information matrices and their determinants!
>>>
>>> Murray Jorgensen
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> --
>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,             Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,                     +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>
>>
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list