# [R] determing the distribution of a sample data set etc..

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat Nov 13 16:10:18 CET 2004

```On Sat, 13 Nov 2004, Ann Huxtable wrote:

> Hello,
>
> I have only recently started using R. I have two data samples that I want to
> carry out some initial explorative data analysis to:
>
> i). Determine the distribution of the data
> ii). Determine whether both datasets are from the same distribution.
>
> I have managed to create unit probability histograms and created qqplots for
> the data. I have attached one of the qqplots. It is clear that the data is

No plot made it to the list: see the posting guide for what attachments
are allowed.

> not from a normal distribution (it forms a convex curve underneath the
> straight line).  the nature of the curve suggest the data is from either
> Chi-square or F distribution (if you think otherwise, I would appreciate your
> help in correcting my analysis).
>
> The point of this mail however, is how do I use R to:
>
> 1). Test if the data is from another distribution (F, Ch-Square etc.. )
> 2). How can I check if the samples are drawn from the same distribution?

I would use qqplots for both purposes.  qqplot will plot one dataset
against another: see its examples.  It will also plot against another
distribution: continuing that example

qqplot(y, qt(ppoints(200), df=5))

You could also compare two samples via the ecdfs and the
Kolmogorov-Smirnov test (examples in the MASS ch05.R script).  But formal
testing is not much help unless you know what sort of differences are
interesting _a priori_ -- you would need enormous samples to distinguish
a t_5 from a t_4, for example.

--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

```