[R] OT: A philosophical question about statistics

Tue May 6 21:28:21 CEST 2025

Kevin,

What some call Nerdy is actually a meaningful and interesting activity to
others. Your examples are quite a bit like all kinds of things in our lives
that we suddenly decide to measure. I, for example, keep track of books I
read and use R to calculate interesting (to me) statistics about how my
reading rate per month changes or what genres I am reading more of or
compare it to earlier years and make a graph showing the overlay.

And, once you have the education and skills in a language like R, you can
volunteer to help a charitable organization collate the "tax" on various
things it wants you to appreciate and produce statistics such as the average
number of children they are thankful for, or what percent own a house.

Often enough, a problem that starts with a small dataset can grow huge if
you keep collecting data for years. Some methods may turn out to be better
to use. On the other hand, our computers and some software tend to be sped
up.

I wonder how many R functions we have been using for years to do statistics
(or other things) have been enhanced over the years, in good ways or bad, so
they now run faster or slower in some cases? As an example, some get
rewritten partially, or even completely, in some C variant or change some
underlying functions they call. Others add lots of other nice options that
you do not use, and in the process, slow down overall from your perspective.
An optimal solution may not remain optimal.

Avi

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Kevin Zembower via
R-help
Sent: Tuesday, May 6, 2025 9:13 AM
To: Bert Gunter <bgunter.4567 using gmail.com>; R-help email list
<r-help using r-project.org>
Subject: Re: [R] OT: A philosophical question about statistics

Bert, thanks so much for your response. I woke up early this morning
and couldn't go back to sleep, formulating a response. I hope I do it
justice. See in-line below.

On Mon, 2025-05-05 at 20:09 +0100, Bert Gunter wrote:
> 1. What you have been taught is mostly useless for addressing "real"
> statistical issues;

I hope this is not so, but I see your point. If by 'real' statistical
problems, you mean the ones professional statisticians face, I agree. I
kinda assumed that a one-semester course in basic statistics wouldn't
qualify me for a professional position as a statistician.

But, if you mean MY 'real' statistical issues, I hope you're wrong.
Here's two examples:

I have sleep apnea and use a CPAP machine. I have the ability to make
changes to the settings on the machine, and have software to read the
data that it writes to an SD card. The software produces data on the
apnea-hypopnea index (AHI) and length of use each night, both of which
can be used to gauge the effectiveness of its use. When I occasionally
make changes to the settings, I'd like to know if the changes improved
my health.

About three years ago, we installed split-unit AC modules in our house,
replacing individual window air conditioners. Since the split units are
connected to a heat pump, they can also be used to supplement the
output of the natural-gas-fired steam boiler that is the main souce of
heating. We were told that the split units would reduce our home's
energy usage, in both heating and cooling, and ultimately save money.
However, I'm not certain that I've seen that savings reflected in our
bills. I am serviced by a utility that allows me to download my house's
daily energy use, in both kilowatts of electricity and therms of
natural gas.

In both these examples, the results could be analyzed by the knowledge
I've gained by studying statistics, I believe. In addition, the results
have a practical significance to me, and are not self-evident without
the use of some sort of analytical tools. 

Is this nerdy? Of course! I don't imagine most, or even many, CPAP
users conduct a hypothesis test when changing their machine. But, I'm
nerdy, and now I have this new tool in my toolbox to apply to problems
like these.

> 2. Most of my 40 or so years of statistical practice involved trying
> to define the questions of interest and determining whether there
> existed or how to best obtain relevant data to answer those
> questions. Once/if that was done, how to obtain answers from the data
> was usually straightforward.

That's a good insight, and I could certainly see how it's true. Almost
all of the problems I've solved in basic statistics presented the data
in a near-perfect format. We never even had to clean up data with
missing values, for example. Certainly, applying the algorithms
(simulation or theoretical) was the easiest and quickest part.

Thank you again, Bert, for replying. I always enjoy your contributions
to this group.

-Kevin

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.