[R-sig-teaching] Handbook of Small Datasets

Bob bob at statland.org
Thu Jan 31 12:26:02 CET 2013


I just looked at a couple of files at NCSU and they looked like the
original files on the disk supplied with the book.  Has anyone found
ones there that are NOT as originally supplied?  Here is an example
(made up) of the kind of problems in the original files.  You might
have a data set with three variables, x and y quantitative, and z
categorical with three groups.  The data file looks just like the
table in the book: six columns, x,y,x,y,x,y with the pairs matching
z=a,b,c.  So the given data have to be stacked and the categorical
variable created.  Not too horrendous if you just want to use that one
datsaset but virtually all the files have similar (often worse)
problems, i.e., you cannot read them into an R dataframe as-is.  You
can find other actual examples in my review

  Review of Two Collections of Data for Use in a First Course in
  Statistics, The American Statistician, Vol.50, No.2 (May 1996), 
  pp. 168-169.
  
I cleaned up and used about 20 datasets myself.  At the time I wrote
the review I had fantasies of finding 25 others who would each
volunteer to clean another 20 each.  I had long given up on that when
Dennis surprised me by offering far more than 20.  So I will be
working with him and will also look at the NCSU versions again. I have
also had others volunteer bits and pieces so I hope we will soon have
all or most in useable form.

PS

While R gurus may feel the problem is minor FOR THEM, I had hoped
to use the book in the following way.  After teaching topic X in a
gen. ed. intro. course, ask students to pick a data set of interest to
them and analyze it using X.  Beginners will not even notice the data
are not in standard format, and will spend lots of time wrestling with
the software to get usable output rather than focussing on the
statistical concepts.  I work a lot with high school teachers of
AP Statistics who themselves have usually taken 1 plus/minus 1
statistics courses and have NO experience with real data.  I would
LOVE to recommend this book to them but they would circle my home and
burn it to the ground after a few hours wrestling with the form the
data are in now.   

Forwarded message:
> From r-sig-teaching-bounces at r-project.org Thu Jan 31 05:01:43 2013
> X-Original-To: bob at statland.org
> X-Csoft-Rule: spam<=5.0
> X-Spam-Checker-Version: SpamAssassin 3.3.2-csoft38 (2011-06-06) on
> 	ubar.csoft.net
> X-Spam-Level: 
> X-Spam-Status: No, score=-6.8 required=6.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED,
> 	DKIM_SIGNED,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham
> 	version=3.3.2-csoft38
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
> 	h=mime-version:x-received:in-reply-to:references:date:message-id
> 	:subject:from:to:cc:content-type;
> 	bh=qO6E7XG6EOx1anYr8Vdu9ic3wJ3HlmmVmsDdeDp8lS8=;
> 	b=Siev/NyEGz+FX4xqHlcgysciCyl9YjsiaYSwPNHuemdWdz5Zl3OuZ08lADtFz5HGrD
> 	8y5r8YLyym+jdkq6zQAEMK7Bfb227z+iwCxfc3qru720Ql1gdodEHShfDuOJtgAN/ujQ
> 	LzF8tI2wvWRmrqAih3Pjh+BhbEEnF3SWVaXkzFl4P1t0QxFIWiJeC9QDalCHyDovO0RA
> 	Ks8pKKNJb/dJOerBXhYD4aihb9xuBNBX7UVo2o6Eg/wxBvKLtbuU9XqBQz8074vsDa4U
> 	UNbxGiXVGa0DpdZKoFlRPgNgkszzfzt12G3e9TGZiAxg4AoPDUWYpqdK24u7UURPsNSn
> 	85hQ==
> MIME-Version: 1.0
> X-Received: by 10.42.58.67 with SMTP id g3mr6504459ich.56.1359626335114; Thu,
> 	31 Jan 2013 01:58:55 -0800 (PST)
> In-Reply-To: <5109EADE.7080205 at gmail.com>
> References: <20130130234801.9BAF471989D5 at mail89.csoft.net>
> 	<5109EADE.7080205 at gmail.com>
> Date: Thu, 31 Jan 2013 01:58:54 -0800
> Message-ID: <CADv2QyGSqGVrJSET-Qah2AdagLWiOpy3h5=r_OWKeSPu-Ft6_A at mail.gmail.com>
> From: Dennis Murphy <djmuser at gmail.com>
> To: Jeff Laux <jefflaux at gmail.com>
> X-Tag-Only: YES
> X-Filter-Node: phil2.ethz.ch
> X-USF-Spam-Level: 
> X-USF-Spam-Status: hits=-0.7 tests=FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS,
> 	T_DKIM_INVALID
> X-USF-Spam-Flag: NO
> X-Virus-Scanned: by amavisd-new at stat.math.ethz.ch
> Cc: r-sig-teaching at r-project.org
> Subject: Re: [R-sig-teaching] Handbook of Small Datasets
> X-BeenThere: r-sig-teaching at r-project.org
> X-Mailman-Version: 2.1.14
> Precedence: list
> List-Id: SIG on Teaching Statistics using R <r-sig-teaching.r-project.org>
> List-Unsubscribe: <https://stat.ethz.ch/mailman/options/r-sig-teaching>,
> 	<mailto:r-sig-teaching-request at r-project.org?subject=unsubscribe>
> List-Archive: <https://stat.ethz.ch/pipermail/r-sig-teaching>
> List-Post: <mailto:r-sig-teaching at r-project.org>
> List-Help: <mailto:r-sig-teaching-request at r-project.org?subject=help>
> List-Subscribe: <https://stat.ethz.ch/mailman/listinfo/r-sig-teaching>,
> 	<mailto:r-sig-teaching-request at r-project.org?subject=subscribe>
> Content-Type: text/plain; charset="us-ascii"
> Content-Transfer-Encoding: 7bit
> Errors-To: r-sig-teaching-bounces at r-project.org
> Sender: r-sig-teaching-bounces at r-project.org
> 
> That's what the book is for: its purpose is to describe the variables
> and context of each data set. The book 'Data' by Andrews and Herzberg
> (1985) is similar in that respect. As I mentioned to Bob privately, I
> thought about making a R package of the data sets in HDLMO several
> years ago because I used a number of them in teaching, but then
> realized that if I wrote the help pages, I'd essentially be violating
> the copyright of the book...so that project died. But I do have a
> collection of R objects for the data sets which I'm editing and hope
> to finish before the weekend is out. Bob prefers a zipped csv archive,
> but I can make an R binary available (or a zipped version of .Rdata
> files) if anyone is interested.
> 
> Dennis
> 
> On Wed, Jan 30, 2013 at 7:54 PM, Jeff Laux <jefflaux at gmail.com> wrote:
> > Yes.  They can be found on NC State's Statistics department's website:
> >
> >      http://www.stat.ncsu.edu/working_groups/sas/sicl/data/
> >
> > However, the accompanying stories don't exist.  What is posted is just tab
> > delimited text files with numeric data.  Someone else will have to say what
> > the numbers are supposed to mean.
> >
> >
> >
> > On 1/30/2013 6:48 PM, Bob wrote:
> >>
> >> Just saw a mention of _Handbook of Small Datasets_.  Does anyone know
> >> if the data files ever got cleaned up and posted on the Internet?  I
> >> bought this when I came out and the disk included files that seemed to
> >> be created by cut and paste from the manuscript.  This meant that the
> >> "shape" of the data matched a typesetter's needs rather than a
> >> statistician's.  Most of the datasets needed considerable manual work
> >> before one could hand them off to students.  (I DID find what appeared
> >> to be the original disfunctional versions online.)  It's really sad
> >> that a collection that was such a good idea on paper was so poorly
> >> implemented.
> >>
> >>
> >> ------->  First-time AP Stats. teacher?  Help is on the way! See
> >> http://courses.ncssm.edu/math/Stat_Inst/Stats2007/Bob%20Hayden/Relief.html
> >>        _
> >>       | |          Robert W. Hayden
> >>       | |          142 Main Street
> >>      /  |          Apartment 104
> >>     |   |          Jaffrey, New Hampshire 03452  USA
> >>     |   |          email: bob@ the site below
> >>    /    |          website: http://statland.org
> >>   | x   /          phone: (603) 532-7224 (home)
> >>   ''''''
> >>
> >> _______________________________________________
> >> R-sig-teaching at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
> >>
> >
> > _______________________________________________
> > R-sig-teaching at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
> 
> _______________________________________________
> R-sig-teaching at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
> 
> 


------->  First-time AP Stats. teacher?  Help is on the way! See
http://courses.ncssm.edu/math/Stat_Inst/Stats2007/Bob%20Hayden/Relief.html
      _
     | |          Robert W. Hayden
     | |          142 Main Street
    /  |          Apartment 104
   |   |          Jaffrey, New Hampshire 03452  USA
   |   |          email: bob@ the site below
  /    |          website: http://statland.org
 | x   /          phone: (603) 532-7224 (home)
 ''''''



More information about the R-sig-teaching mailing list