[R] Large determinant problem
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sun Dec 9 13:25:28 CET 2007
So S is numerically singular: the last 10 or 11 values are effectively 0.
As Peter Dalgaard said, scaling may help but only if the parameters are on
drastically different scales.
On Sun, 9 Dec 2007, maj at stats.waikato.ac.nz wrote:
> What you say about mixture models is true in general, however this fit was
> the best of 100 random EM starts. Unbounded likelihoods I believe are only
> a problem for continuous data mixture models and mine was discrete. Anyway
> it's nearly midnight now here so I'd better sleep. Before I go, here are
> the singular values:
>
>
>> svd(S)$d
> [1] 1.207593e+05 1.049068e+05 9.308082e+04 8.332758e+04 6.929102e+04
> [6] 6.323142e+04 5.977638e+04 5.723191e+04 4.375631e+04 2.723792e+04
> [11] 2.592586e+04 2.411705e+04 2.392963e+04 2.196578e+04 2.169200e+04
> [16] 2.123290e+04 2.054479e+04 1.948157e+04 1.927687e+04 1.777423e+04
> [21] 1.768510e+04 1.754492e+04 1.735954e+04 1.643881e+04 1.600038e+04
> [26] 1.588009e+04 1.584179e+04 1.419902e+04 1.401829e+04 1.332706e+04
> [31] 1.310741e+04 1.282854e+04 1.240196e+04 1.229453e+04 1.198187e+04
> [36] 1.168831e+04 1.069801e+04 1.063407e+04 1.060623e+04 1.056741e+04
> [41] 1.037193e+04 1.018307e+04 9.954778e+03 9.691297e+03 9.544900e+03
> [46] 9.353932e+03 9.084223e+03 9.023719e+03 8.538460e+03 8.260557e+03
> [51] 7.789166e+03 7.624562e+03 7.552246e+03 7.371003e+03 7.249892e+03
> [56] 7.170754e+03 7.143712e+03 7.041465e+03 7.019497e+03 6.918243e+03
> [61] 6.725985e+03 6.635220e+03 6.610919e+03 6.600485e+03 6.378983e+03
> [66] 6.255341e+03 6.252620e+03 5.944109e+03 5.890990e+03 5.875790e+03
> [71] 5.812950e+03 5.786653e+03 5.754739e+03 5.743921e+03 5.729494e+03
> [76] 5.588519e+03 5.558093e+03 5.511866e+03 5.447340e+03 5.436718e+03
> [81] 5.390440e+03 5.389862e+03 5.351446e+03 5.323460e+03 5.231327e+03
> [86] 5.154886e+03 5.146495e+03 5.103094e+03 5.062339e+03 5.016310e+03
> [91] 5.007371e+03 5.003195e+03 4.987950e+03 4.984937e+03 4.971855e+03
> [96] 4.963557e+03 4.913927e+03 4.891866e+03 4.845879e+03 4.841233e+03
> [101] 4.807681e+03 4.789150e+03 4.768244e+03 4.752387e+03 4.685244e+03
> [106] 4.667949e+03 4.662146e+03 4.655817e+03 4.615451e+03 4.542832e+03
> [111] 4.463354e+03 4.448647e+03 4.420757e+03 4.393323e+03 4.368262e+03
> [116] 4.330368e+03 4.322231e+03 4.280486e+03 4.269604e+03 4.266072e+03
> [121] 4.227934e+03 4.210416e+03 4.197196e+03 4.169111e+03 4.168029e+03
> [126] 4.145750e+03 4.137148e+03 4.117092e+03 4.102093e+03 4.031528e+03
> [131] 3.997150e+03 3.989493e+03 3.960800e+03 3.954143e+03 3.921214e+03
> [136] 3.892764e+03 3.861505e+03 3.831798e+03 3.821399e+03 3.816648e+03
> [141] 3.813275e+03 3.797050e+03 3.788435e+03 3.765362e+03 3.753526e+03
> [146] 3.750739e+03 3.717638e+03 3.704314e+03 3.700483e+03 3.683338e+03
> [151] 3.669548e+03 3.651310e+03 3.645356e+03 3.636891e+03 3.634490e+03
> [156] 3.631998e+03 3.598744e+03 3.578298e+03 3.577353e+03 3.492344e+03
> [161] 3.457991e+03 3.438116e+03 3.401560e+03 3.398088e+03 3.390086e+03
> [166] 3.362965e+03 3.328079e+03 3.306448e+03 3.289258e+03 3.283123e+03
> [171] 3.268046e+03 3.254232e+03 3.238759e+03 3.176306e+03 3.173192e+03
> [176] 3.145273e+03 3.132647e+03 3.124703e+03 3.116454e+03 3.028187e+03
> [181] 3.026404e+03 3.003130e+03 2.985991e+03 2.952215e+03 2.946402e+03
> [186] 2.937366e+03 2.902973e+03 2.867319e+03 2.855981e+03 2.843939e+03
> [191] 2.830485e+03 2.788518e+03 2.761445e+03 2.753757e+03 2.752846e+03
> [196] 2.725580e+03 2.723263e+03 2.669216e+03 2.640574e+03 2.545404e+03
> [201] 2.543216e+03 2.508090e+03 2.486351e+03 2.465191e+03 2.447437e+03
> [206] 2.431466e+03 2.424620e+03 2.423907e+03 2.399220e+03 2.369538e+03
> [211] 2.305238e+03 2.261185e+03 2.252992e+03 2.171784e+03 2.169940e+03
> [216] 2.127546e+03 2.094436e+03 2.074605e+03 2.056932e+03 2.053942e+03
> [221] 2.011659e+03 1.993672e+03 1.934327e+03 1.893751e+03 1.848455e+03
> [226] 1.838315e+03 1.763492e+03 1.728018e+03 1.726965e+03 1.623798e+03
> [231] 1.617925e+03 1.554590e+03 1.498835e+03 1.421876e+03 1.256465e+03
> [236] 1.200904e+03 1.118300e+03 1.101870e+03 1.055408e+03 9.238208e+02
> [241] 8.125509e+02 7.031272e+02 6.943645e+02 6.338677e+02 5.772709e+02
> [246] 5.077392e+02 4.566595e+02 4.025622e+02 3.118065e+02 3.043827e+02
> [251] 2.412400e+02 2.386435e+02 1.395932e+02 1.225108e+02 1.084912e+02
> [256] 9.846993e+01 8.964959e+01 8.446336e+01 2.486490e-05 5.362792e-11
> [261] 9.161356e-12 9.161356e-12 9.161356e-12 9.161356e-12 9.161356e-12
> [266] 9.161356e-12 9.161356e-12 9.161356e-12 9.161356e-12
>
> Murray
>
>> On Sun, 9 Dec 2007, maj at stats.waikato.ac.nz wrote:
>>
>>> I tried crossprod(S) but the results were identical. The term
>>> -0.5*log(det(S)) is a complexity penalty meant to make it unattractive
>>> to
>>> include too many components in a finite mixture model. This case was for
>>> a
>>> 9-component mixture. At least up to 6 components the determinant behaved
>>> as expected and increased with the number of components.
>>
>> And the singular values were?
>>
>> I am not surprised at this: if you have too many components some of them
>> may not be contributing to the fit or duplicating others: both lead to
>> numerically singular information matrices. In many mixture-fitting
>> problems the log-likelihood is unbounded but with many local maxima: it is
>> very easy to find a poor one.
>>
>>>
>>> Thanks for your comments.
>>>
>>>> Hmm, S'S is numerically singular. crossprod(S) would be a better way
>>>> to
>>>> compute it than crossprod(S,S) (it does use a different algorithm), but
>>>> look at the singular values of S, which I suspect will show that S is
>>>> numerically singular.
>>>>
>>>> Looks like the answer is 0.
>>>>
>>>>
>>>> On Sun, 9 Dec 2007, maj at stats.waikato.ac.nz wrote:
>>>>
>>>>> I thought I would have another try at explaining my problem. I think
>>>>> that
>>>>> last time I may have buried it in irrelevant detail.
>>>>>
>>>>> This output should explain my dilemma:
>>>>>
>>>>>> dim(S)
>>>>> [1] 1455 269
>>>>>> summary(as.vector(S))
>>>>> Min. 1st Qu. Median Mean 3rd Qu. Max.
>>>>> -1.160e+04 0.000e+00 0.000e+00 -4.132e-08 0.000e+00 8.636e+03
>>>>>> sum(as.vector(S)==0)/(1455*269)
>>>>> [1] 0.8451794
>>>>> # S is a large moderately sparse matrix with some large elements
>>>>>> SS <- crossprod(S,S)
>>>>>> (eigen(SS,only.values = TRUE)$values)[250:269]
>>>>> [1] 9.264883e+04 5.819672e+04 5.695073e+04 1.948626e+04
>>>>> 1.500891e+04
>>>>> [6] 1.177034e+04 9.696327e+03 8.037049e+03 7.134058e+03
>>>>> 1.316449e-07
>>>>> [11] 9.077244e-08 6.417276e-08 5.046411e-08 1.998775e-08
>>>>> -1.268081e-09
>>>>> [16] -3.140881e-08 -4.478184e-08 -5.370730e-08 -8.507492e-08
>>>>> -9.496699e-08
>>>>> # S'S fails to be non-negative definite.
>>>>>
>>>>> I can't show you how to produce S easily but below I attempt at a
>>>>> reproducible version of the problem:
>>>>>
>>>>>> set.seed(091207)
>>>>>> X <- runif(1455*269,-1e4,1e4)
>>>>>> p <- rbinom(1455*269,1,0.845)
>>>>>> Y <- p*X
>>>>>> dim(Y) <- c(1455,269)
>>>>>> YY <- crossprod(Y,Y)
>>>>>> (eigen(YY,only.values = TRUE)$values)[250:269]
>>>>> [1] 17951634238 17928076223 17725528630 17647734206 17218470634
>>>>> 16947982383
>>>>> [7] 16728385887 16569501198 16498812174 16211312750 16127786747
>>>>> 16006841514
>>>>> [13] 15641955527 15472400630 15433931889 15083894866 14794357643
>>>>> 14586969350
>>>>> [19] 14297854542 13986819627
>>>>> # No sign of negative eigenvalues; phenomenon must be due
>>>>> # to special structure of S.
>>>>> # S is a matrix of empirical parameter scores at an approximate
>>>>> # mle for a model with 269 paramters fitted to 1455 observations.
>>>>> # Thus, for example, its column sums are approximately zero:
>>>>>> summary(apply(S,2,sum))
>>>>> Min. 1st Qu. Median Mean 3rd Qu. Max.
>>>>> -1.148e-03 -2.227e-04 -7.496e-06 -6.011e-05 7.967e-05 8.254e-04
>>>>>
>>>>> I'm starting to think that it may not be a good idea to attempt to
>>>>> compute
>>>>> large information matrices and their determinants!
>>>>>
>>>>> Murray Jorgensen
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>> --
>>>> Brian D. Ripley, ripley at stats.ox.ac.uk
>>>> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>>>> University of Oxford, Tel: +44 1865 272861 (self)
>>>> 1 South Parks Road, +44 1865 272866 (PA)
>>>> Oxford OX1 3TG, UK Fax: +44 1865 272595
>>>>
>>>>
>>>
>>>
>>
>> --
>> Brian D. Ripley, ripley at stats.ox.ac.uk
>> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford, Tel: +44 1865 272861 (self)
>> 1 South Parks Road, +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK Fax: +44 1865 272595
>>
>>
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list