# [R] Warning - Naive Question Alert

Bill Venables Bill.Venables at cmis.csiro.au
Sun Aug 27 02:54:43 CEST 2000

```At 11:08 26/08/00 -0700, Marc R. Feldesman wrote:
>In my research area, researchers are very stingy about sharing data.  In
>fact, they are frequently downright secretive about these
>morsels.  Typically, we get fed a diet of summary statistics and the
>assurance that the data are *normal* without any necessary documentation
>that they really are.
>
>With that as background and assuming that the data really are normal, is
>there any way in R (or any of the S engines) to generate a data set that
>mimics exactly the summary properties reported in a published paper?  I
>know I can use rnorm() and mvrnorm() for this, but neither function will
>necessarily or very likely return a sample that has the *same* properties
>as the given population.  At best, I can sift though replicates until I
>find the one closest to the "original".   This approach doesn't seem very
>efficient or valid.  Is there another way to do this?
>

It's possible to give an answer to this that is in principle complete but
in practice almost useless:  You can do it if the summary statistics are
sufficient.  In that case the conditional distribution of the sample given
the sufficient statistics is independent of the parameter vector, so in
principle what you have to do is generate an instance of the conditional
distribution, put it together with the sufficient statistic and
re-constitute the instance of the original sample.

For a single normal sample the idea is simple enough.  Generate any normal
sample, standardize it by subtracting its sample mean and dividing by its
sample standard deviation.  This generates an instance of the conditional
distribution, which in this case is a uniform distribution over the
(n-2)-dimensional sphere.  Now just multiply by the summary statistic
standard deviation and add on the summary statistic mean and you have it.

Extending this to the multivariate case should not be all that difficult.
One way you could do it is generate any old multivariate normal sample of
the right dimension, "standardize" it by subtracting the mean and
multiplying by the inverse of a symmetric square root of the variance
matrix, then multiply by the same sort of square root of the given summary
statistic variance matrix and add back on the given sample mean.  [Warning:
I am doing this at the keyboard, ideally I would like a bit more time to
think about it...but I don't have that sort of time right now.  Caveat
emptor!]

>
>Dr. Marc R. Feldesman
>email:  feldesmanm at pdx.edu
>email:  feldesman at ibm.net
>fax:    503-725-3905
>
>"Don't know where I'm going.
> Don't like where I've been.
> There may be no exit.
> But hell, I'm going in."  Jimmy Buffett

"Work like you don't need the money,
dance like there's no one watching,
love like you've never been hurt."  (Anon.)