[R] test for whether dataset comes from a known MVN

Fri Oct 12 19:33:04 CEST 2007

Desmond Campbell wrote:
> 
> Dear Ben Bolker,
> 
> Thanks for replying and offering advice, unfortunately it doesn't solve my 
> problem.
> 
> 1) The mshapiro.test() in the mvnormtest package appears only applicable 
> for datasets containing 3-5000 samples, whereas my dataset contains
> 100,000 
> samples.
> 
> 2) As you said in your email if my data is from the real world then any 
> test is likely to reject the null hypothesis, because of the power of such
> a 
> large dataset.
> 
> However my data is not from the real world. I am conducting validation 
> studies, and if the program I am testing is working correctly then the
> dataset 
> will be perfectly normally distributed.
> 
> Thanks anyway.
> 
> 

 I would be tempted in this case to contact the package author and find
out what limits the size of the input data set.  It does look like the
method requires a matrix inversion, in which case you might be in big
trouble (if it were sparse you could see if you could substitute in SparseM
functions, but I kind of doubt it would be ...).
   Do you know if anyone has come up with a method that will do this
test for this size data set?  i.e., is this a problem of developing a
statistical
method or a problem of implementation in R?  (Are the methods discussed
in http://support.sas.com/ctx/samples/index.jsp?sid=480 or
http://interstat.statjournals.net/YEAR/2003/articles/0301001.pdf such
as Mardia's multivariate skew or kurtosis appropriate and less numerically
intensive?  I don't know how to calculate MV skew, and R site search brings
up a lot about the MV skew-normal distribution but not a lot about MV skew
itself.  I found an SPSS macro http://www.columbia.edu/~ld208/Mardia.sps but
that's as far as I got.)
   Do you have to test the whole data set at once?  Could you hack it
by testing subsets of the data and (e.g.) using Fisher's combined p values?

  cheers
   Ben Bolker

-- 
View this message in context: http://www.nabble.com/test-for-whether-dataset-comes-from-a-known-MVN-tf4609195.html#a13177063
Sent from the R help mailing list archive at Nabble.com.