[R] Inference for R Spam

Rolf Turner r.turner at auckland.ac.nz
Wed Mar 4 21:09:45 CET 2009


On 5/03/2009, at 4:54 AM, Michael A. Miller wrote:

>>>>>> "Rolf" == Rolf Turner <r.turner at auckland.ac.nz> writes:
>
>> On 4/03/2009, at 11:50 AM, Michael A. Miller wrote:
>
>>> Sports scores are not statistics, they are measurements
>>> (counts) of the number of times each team scores.  There
>>> is no sampling and vanishingly small possibility of
>>> systematic error in the measurement.
>
>> I think this comment indicates a fundamental
>> misunderstanding of the nature of statistics in general and
>> the concept of variability in particular.  Measurement
>> error is only *one possible* source of variability and is
>> often a minor --- or as in the case of sports scores a
>> non-existent --- source.
>
> Would you elaborate Rolf?  I'm was referring to measurements, not
> statistics.  Isn't calling scores statistics similar to saying
> that the values of some response in an individual subject before
> and after treatment are statistics?  I think they are just
> measured values and that if they are measured accurately enough,
> they can be precisely known.  It is in considering the
> distribution of similar measurements obtained in repeated trials
> that statistics come into play.
>
>> From my perspective as a baseball fan (I know I'm in Indiana and
> I aught to be more of a basketball fan, but I grew up as a Cubs
> watcher and still can't shake it), it doesn't seem to me that the
> purpose of the score is to allow for some inference about the
> overall population of teams.  It is about which team beats the
> other one and entertainment (and hot dogs) for the fans.

Well the *purpose* of the score has nothing to do with statistics
as such, but then then the ``purpose'' of many (most?) observations
to which the ideas of statistics are applied has nothing to do
with statistics either.

Technically a statistic is any function of a *sample* (sample =
a collection of random variables), including any one of these
random variables themselves.

The purpose of the subject or discipline ``statistics'' is in essence
to answer the question ``could the phenomenon we observed have arisen
simply by chance?'', or to quantify the *uncertainty* in any estimate
that we make of a quantity.

E.g., to stick with the sports idea:  We might ask ``Is there a home
field advantage?'' or ``How big is the home field advantage?'' or
``Is the home field advantage in the Premier Division (English football)
bigger than that in the equivalent division/league in Italian  
football?''

We would collect a sample or samples of pairs of scores

	(X,Y) = (home team score, away team score)

and analyse these scores in some way, possibly on the basis of the
differences X - Y, possibly not, in order to answer these *statistical*
questions.  Not that there is *variability* or *uncertainty* in the  
differences
X - Y.  Even if we knew exactly that the home field advantage was  
1.576 goals,
we would not be able to say that the home team would always win by  
exactly
1.576 goals.  In fact the home team would *never* win by exactly  
1.576 goals! :-)

Sports scores are random variables.  You don't know a priori what the  
scores are
going to be, do you?  (Well, if you do, you must be able to make a  
*lot* of money
betting on games!)  After the game is over they aren't random any  
more; they're
just numbers.  But that applies to any random variable.  A random  
variable is
random only until it is observed, then POOF! it turns into a number.

The randomness in the scores does not arise from measurement error.   
This is usually
the case with integer valued random variables.  An ornithologist  
counting birds nests
in quadrats does not have to contend with measurement error.  Well,  
some ornithologists
might --- depends on how well they were taught to count.  But the  
quadrat counts are
random variables (statistics) nevertheless --- until they are observed.

	cheers,

		Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}




More information about the R-help mailing list