[R] A somewhat off the line question to a log normal distrib

Thu Dec 2 16:53:03 CET 2004

On 02-Dec-04 David Whiting wrote:
> Robin Hankin <r.hankin at soc.soton.ac.uk> writes:
> 
>> [stuff about the CLT deleted]
>> 
>> >
>> > So you can use R usefully to eveluate general statisical
>> > issues of this kind!
>> >
>> 
>> absolutely!  R is excellent for this sort of thing.  I use
>> it for teaching stats all the time.
>> I'd say that without a tool like R you cannot learn statistics.
> 
> I believe Fisher and a few others managed to get by without it.

But the rest of us can depend quite heavily on groping through
instances until we see the light (I can remember doing crude
simulations using a slide rule and Kendall & Babington Smith's
"Tables of Random Normal Deviates" ... . You yougsters are
spoiled these days, with resources like R.)

Even the great, however, resorted to laborious simulations.
Student's pioneering paper "On the probable error of a mean"
(Biometrika 1908) gives the analytical form of the t
distribution (though not quite in its modern formulation:
he used z = mean/SD). In the paper he obtains it by laboriously
evaluating analytical moments of the numerator, of the
denominator, and their correlation (showing this to be zero
and hence "inferring" independence); he can then analytically
integrate their "joint distribution" to obtain the equation
of the z-distribution.

But, in a later section, he writes:

  "Before I had succeeded in solving my problem analytically,
   I had endeavoured to do so empirically. The material used
   was a correlation table containing the height and left
   middle finger measurements of 3000 criminals, from a paper
   by W. R. Macdonnell (Biometrika, I, p. 219). The measurements
   were written out on 3000 pieces of cardboard, which were then
   very thoroughly shuffled and drawn at random. As each card
   was drawn its numbers were written down in a book, which thus
   contains the measurements of 3000 criminals in a random order.
   Finally, each consecutive set of 4 was taken as a sample--750
   in all--and the mean, standard deviation, and correlation of
   each sample determined. The difference between the mean of
   each sample and the mean of the population was then divided
   by the standard deviation of the sample, giving us the z
   of Section III. This provides us with two sets of 750 standard
   deviations and two sets of 750 z's on which to test the
   theoretical results arrived at."

While in this paper he compares these results with his theoretical
formula, as a test, I seem to recall (which someone may be able to
confirm or refute) that originally (which is consistent with his
statement "Before I had succeeded in solving my problem analytically,
I had endeavoured to do so empirically") that he had used such
a sampling simulation to obtain the first 4 empirical moments
of the distribution of his z, and used these to identify the
distribution as a "Pearson Type VII" which is, in effect, the
t-distribution. If true, this would be an instance of one of
the great having been led to the truth by experimental exploration
of the kind being discussed.

As a further historical snippet: Fisher, as a Cambridge student
whose tutor was F.J.M. Stratton, noticed an apparent discrepancy
between Student's results and what he had worked out for himself
and drew this to the attention of Stratton (who knew Student);
On Stratton's suggestion Fisher contacted Student in 1912, out
of which correspondence came a correct proof. Student himself
wrote at one point to Stratton, somewhat complaining of "two
foolscap pages covered with mathematics of the deepest dye"
which had been sent him by "this chap Fisher". (Student's
letter is in fact about a later communication from Fisher
"so nice and mathematical that it might appeal to some people.")

(See "R.A. Fisher: the Life of a Scientist" by Joan Fisher Box).

Sorry to be drifting off-topic again, but I couldn't resist
"this chap Fisher".

Best wishes to all,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 02-Dec-04                                       Time: 15:53:03
------------------------------ XFMail ------------------------------