[R-sig-teaching] Online resources for teaching intro stats using R

Douglas Bates bates at stat.wisc.edu
Mon Oct 8 21:35:06 CEST 2007


As a long-time user and developer of S then R, I am committed to
having students, even introductory students, use R in my courses.  I
primarily teach introductory statistics for engineering students I
think that using R to remove the computational burden (except, of
course, for the need to learn to use R to some extent) is a remarkable
enhancement to any intro stats course.

I also see R as a way of removing some of the material in our courses
that is no longer necessary.  I tell students that the textbook in the
introductory engineering statistics course that I took in Spring
semester of 1969 has essentially the same table of contents as many
current texts for such a course, despite all the changes in computing
technology.  In 1969 you could easily tell who the engineering
students were because we all carried slide rules everywhere.  I don't
see a lot of slide rules around campus today.

The inertia regarding topics in an intro course results in students
still being taught
 - the normal approximation to a binomial distribution
 - the Poisson approximation to a binomial distribution
 - the concept of "sampling with replacement" to motivate a binomial
approximation to a hypergeometric distribution
 - histograms but not empirical density plots
 - hypothesis tests with a fixed level and a rejection region for the
test statistic, rather than evaluation of a p-value
 - "large-sample" z-tests or intervals versus "small-sample" t-tests
or intervals
We really do owe it to ourselves as a profession to ask ourselves why
we continue to teach such topics.

If you stop and think for a moment, none of the approximations of
distributions that we teach in intro courses are needed.  If you are
modeling drawing a random sample from a population of size 18,000,000
and a hypergeometric distribution is appropriate then you can and
should use a hypergeometric.

Over the last several years I often found myself in the position of
opposing the text book in my courses.  I would say that the text
describes this awkward way of doing things (transform to a standard
normal, juggle around the "less than" and "greater than" signs until
you can evaluate a probability from a table in the text) but you, the
student, should ignore that and do things the much easier way of
simply evaluating the probability that you want to evaluate.  This is
a burden on students.  Even though we would like to think of our
brilliant lectures are the main font of wisdom for students in our
courses, a substantial portion of them learn most of the material from
the text. When the text and the instructor disagree, confusion ensues.

I have reached the point where I can tell in a few seconds if I want
to consider a text.  If I open it up and see probability tables in an
appendix I reject it.

One approach is to change to a text that does use R, such as the books
by Peter Dalgaard or John Verzani or Michael Crawley.  In fact I am
using Peter's book but it is difficult to use as a stand-alone text if
one is also expected to cover some probability.

I am supplementing Peter's book with slides and other PDF documents
created from Sweave sources.  As seems to happen in courses that I
teach, these are "just in time" documents (and, on occasion, "just a
little too late").

It is always difficult to create a textbook using a system like R
because the software changes so rapidly relative to the time scale for
writing and publishing texts.  A five-year old book like Peter's is a
recent book.  A five-year-old version of R is ancient.  I'm back at my
old tricks of disagreeing with the text even on Peter's book because I
think that Peter does a wonderful job of explaining traditional
graphics in R but students should forgo that and learn lattice right
from the start.  In the five years since Peter's book was published
(which means six or seven years since he wrote various sections of it)
lattice has matured tremendously and is now documented in Paul
Murrell's book on R Graphics and in Deepayan's forthcoming book on
Lattice (which is "insanely great", by the way).

Many publishers (but, thankfully, not Peter's publisher) would like to
be able to control all aspects of the course presentation.  It is
natural to them that if there is to be electronic material, such as
data sets and sample analyses, associated with the text then they
should publish it as a CD-ROM to be included with the text.  We all
know that doesn't work because the CD-ROM is going to be exactly as
old as the text but the material on the CD-ROM should have changed
much more rapidly.

I am becoming convinced that all the supplemental materials should be
available on the web.  CRAN packages provide one very useful way of
disseminating the supplemental material but other forms, perhaps a
wiki, may also be useful.  There is a need for practice material and
worked-out examples and reference sheets for basic definitions and
facts about distributions, for example, that go beyond what would
traditionally be part of a CRAN package.

Some materials on general topics such as various probability
distributions or data graphics or classical tests in R could be useful
without reference to a particular book.  I am thinking of the sort of
"Schaum's Outline" supplement where basic properties and definitions
are presented and a number of worked-out examples are given.  Another
useful resource for intro teaching would be a repository of test
questions although that may perhaps be too instructor-dependent.

I don't know the best way to collaborate on building such resources.
I would consider something like a series of vignettes so the R code
could be available as well as a printable file and probably the
sources.  Others may find a wiki to be more natural although I am
still trying to decide what a wiki is (remember - I was taking intro
stats in 1969).  One possible collaborative mechanism it the
R-forge.R-project.org site where we could start a project and
contribute to it.  It has the advantage of a stable and relatively
easy to remember URL for referring students.

I welcome private replies and comments or, preferably, a discussion on
this list.




More information about the R-sig-teaching mailing list