[R] EM norm package (NA/NaN/Inf in foreign function call (ar

Thu Aug 26 20:03:53 CEST 2004

On 26-Aug-04 tk at tariqkhan.org wrote:
> The following code should replicate the error by downloading
> the dataset from the internet (it is not too big):
> 
> library(norm)
> df<-download.file("http://www.tariqkhan.org/R/DataFromExcel.csv",
> "C:/Program Files/R/d.csv")
> mat<-as.matrix(read.table("C:/Program Files/R/d.csv", sep = ","))

I downloaded the dataset in my own way: 51x26 matrix with 166
missing values, right? -- and then:

  mat <- as.matrix(read.csv("DataFromExcel.csv"))

> s<-prelim.norm(mat)
> rngseed(1234567)

You don't need to set rngseed at this stage, since em.norm does
not require it; but never mind, it is needed if you go on to do
imputations.

> thetahat<-em.norm(s, maxits = 1000, criterion = 0.0035)
> 
> Iterations of EM:
> 1...2...3........348...349...Error: NA/NaN/Inf in foreign function call
> (arg 2)

I did not get this result: using the same command, em.norm terminated
normally after 82 iterations.

You can get your error message when a [nearly] singular matrix is
generated in the course of em.norm, since it has to invert a matrix
to compute the expected values of the missing components of the
sufficient statistics.

Having set rngseed as above, I then did

  mat.imp<-imp.norm(s,thetahat,mat)

after which

  svd(mat.imp)$d
   [1] 8.343633e+04 2.321644e-01 1.751089e-01 1.275187e-01
   [5] 1.116023e-01 8.807676e-02 8.006840e-02 6.198593e-02
   [9] 6.002220e-02 5.918019e-02 5.617467e-02 4.797701e-02
  [13] 4.631037e-02 4.239089e-02 3.917043e-02 3.786447e-02
  [17] 3.007310e-02 2.704916e-02 2.397084e-02 2.025846e-02
  [21] 1.681492e-02 1.336568e-02 9.161890e-03 6.042817e-03
  [25] 4.795948e-03 6.187377e-10

shows that the imputed matrix is close to 1-dimensional
and very nearly singular:
  the largest singular value is 8e+04,
  the next 10 are O(0.1),
  the next 14 are O(0.01),
  and the last one is O(1e-09).

so there is the potential for singularity problems, However,
as I say, I did not encounter any, so the behaviour you observe
is a bit puzzling.

I observe that if I set "criterion = 0.000699" or greater
(compared with your 0.0035), then em.norm terminates normally
in 479 cycles of fewer, while for "criterion = 0.000698" or
less it goes the full 1000 cycles. But still no error message.
However, this does suggest that the maximum is not too well defined.

I'm using norm version 1.0-9, like you, with R version 1.8.1 on
Linux (so I did dos2unix on DataFromExcel.csv as well, but that
shouldn't matter).

Apart from the version of R, the only difference between us is
that you're running on Windows rather than Linux, but hopefully
that shouldn't matter either.

Hmmm.
Best wishes,
Ted.

> Someone else on the list found that using scale() helped with
> em.norm, but for me it only increased the number of iterations
> before giving the same error.
> 
> I dont get it. Insights into what I can do to solve this would
> be much appreciated!
> 
> Details:
> Norm Package version 1.0.9; 
> R version 1.9.0;
> Windows XP Pro 2002 SP1; 
> 384MB RAM, Pentium 4 CPU 2.40 GHz

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 26-Aug-04                                       Time: 19:03:53
------------------------------ XFMail ------------------------------