[R-sig-teaching] Handbook of Small Datasets

Bob bob at statland.org
Thu Jan 31 13:57:49 CET 2013


The datsets are also available on the publisher's website.  I just
compared several files among those, the versions at NCSU, and the
version that came on a floppy with the book when it first came out.
File sizes are exactly the same and for the files I looked inside the
layout is sheer madness in all three versions.

Forwarded message:
> 
> 
> I just looked at a couple of files at NCSU and they looked like the
> original files on the disk supplied with the book.  Has anyone found
> ones there that are NOT as originally supplied?  Here is an example
> (made up) of the kind of problems in the original files.  You might
> have a data set with three variables, x and y quantitative, and z
> categorical with three groups.  The data file looks just like the
> table in the book: six columns, x,y,x,y,x,y with the pairs matching
> z=a,b,c.  So the given data have to be stacked and the categorical
> variable created.  Not too horrendous if you just want to use that one
> datsaset but virtually all the files have similar (often worse)
> problems, i.e., you cannot read them into an R dataframe as-is.  You
> can find other actual examples in my review
> 
>   Review of Two Collections of Data for Use in a First Course in
>   Statistics, The American Statistician, Vol.50, No.2 (May 1996), 
>   pp. 168-169.
>   
> I cleaned up and used about 20 datasets myself.  At the time I wrote
> the review I had fantasies of finding 25 others who would each
> volunteer to clean another 20 each.  I had long given up on that when
> Dennis surprised me by offering far more than 20.  So I will be
> working with him and will also look at the NCSU versions again. I have
> also had others volunteer bits and pieces so I hope we will soon have
> all or most in useable form.
> 
> PS
> 
> While R gurus may feel the problem is minor FOR THEM, I had hoped
> to use the book in the following way.  After teaching topic X in a
> gen. ed. intro. course, ask students to pick a data set of interest to
> them and analyze it using X.  Beginners will not even notice the data
> are not in standard format, and will spend lots of time wrestling with
> the software to get usable output rather than focussing on the
> statistical concepts.  I work a lot with high school teachers of
> AP Statistics who themselves have usually taken 1 plus/minus 1
> statistics courses and have NO experience with real data.  I would
> LOVE to recommend this book to them but they would circle my home and
> burn it to the ground after a few hours wrestling with the form the
> data are in now.   
> 
> Forwarded message:
> > From r-sig-teaching-bounces at r-project.org Thu Jan 31 05:01:43 2013
> > X-Original-To: bob at statland.org
> > X-Csoft-Rule: spam<=5.0
> > X-Spam-Checker-Version: SpamAssassin 3.3.2-csoft38 (2011-06-06) on
> > 	ubar.csoft.net
> > X-Spam-Level: 
> > X-Spam-Status: No, score=-6.8 required=6.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED,
> > 	DKIM_SIGNED,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham
> > 	version=3.3.2-csoft38
> > DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
> > 	h=mime-version:x-received:in-reply-to:references:date:message-id
> > 	:subject:from:to:cc:content-type;
> > 	bh=qO6E7XG6EOx1anYr8Vdu9ic3wJ3HlmmVmsDdeDp8lS8=;
> > 	b=Siev/NyEGz+FX4xqHlcgysciCyl9YjsiaYSwPNHuemdWdz5Zl3OuZ08lADtFz5HGrD
> > 	8y5r8YLyym+jdkq6zQAEMK7Bfb227z+iwCxfc3qru720Ql1gdodEHShfDuOJtgAN/ujQ
> > 	LzF8tI2wvWRmrqAih3Pjh+BhbEEnF3SWVaXkzFl4P1t0QxFIWiJeC9QDalCHyDovO0RA
> > 	Ks8pKKNJb/dJOerBXhYD4aihb9xuBNBX7UVo2o6Eg/wxBvKLtbuU9XqBQz8074vsDa4U
> > 	UNbxGiXVGa0DpdZKoFlRPgNgkszzfzt12G3e9TGZiAxg4AoPDUWYpqdK24u7UURPsNSn
> > 	85hQ==
> > MIME-Version: 1.0
> > X-Received: by 10.42.58.67 with SMTP id g3mr6504459ich.56.1359626335114; Thu,
> > 	31 Jan 2013 01:58:55 -0800 (PST)
> > In-Reply-To: <5109EADE.7080205 at gmail.com>
> > References: <20130130234801.9BAF471989D5 at mail89.csoft.net>
> > 	<5109EADE.7080205 at gmail.com>
> > Date: Thu, 31 Jan 2013 01:58:54 -0800
> > Message-ID: <CADv2QyGSqGVrJSET-Qah2AdagLWiOpy3h5=r_OWKeSPu-Ft6_A at mail.gmail.com>
> > From: Dennis Murphy <djmuser at gmail.com>
> > To: Jeff Laux <jefflaux at gmail.com>
> > X-Tag-Only: YES
> > X-Filter-Node: phil2.ethz.ch
> > X-USF-Spam-Level: 
> > X-USF-Spam-Status: hits=-0.7 tests=FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS,
> > 	T_DKIM_INVALID
> > X-USF-Spam-Flag: NO
> > X-Virus-Scanned: by amavisd-new at stat.math.ethz.ch
> > Cc: r-sig-teaching at r-project.org
> > Subject: Re: [R-sig-teaching] Handbook of Small Datasets
> > X-BeenThere: r-sig-teaching at r-project.org
> > X-Mailman-Version: 2.1.14
> > Precedence: list
> > List-Id: SIG on Teaching Statistics using R <r-sig-teaching.r-project.org>
> > List-Unsubscribe: <https://stat.ethz.ch/mailman/options/r-sig-teaching>,
> > 	<mailto:r-sig-teaching-request at r-project.org?subject=unsubscribe>
> > List-Archive: <https://stat.ethz.ch/pipermail/r-sig-teaching>
> > List-Post: <mailto:r-sig-teaching at r-project.org>
> > List-Help: <mailto:r-sig-teaching-request at r-project.org?subject=help>
> > List-Subscribe: <https://stat.ethz.ch/mailman/listinfo/r-sig-teaching>,
> > 	<mailto:r-sig-teaching-request at r-project.org?subject=subscribe>
> > Content-Type: text/plain; charset="us-ascii"
> > Content-Transfer-Encoding: 7bit
> > Errors-To: r-sig-teaching-bounces at r-project.org
> > Sender: r-sig-teaching-bounces at r-project.org
> > 
> > That's what the book is for: its purpose is to describe the variables
> > and context of each data set. The book 'Data' by Andrews and Herzberg
> > (1985) is similar in that respect. As I mentioned to Bob privately, I
> > thought about making a R package of the data sets in HDLMO several
> > years ago because I used a number of them in teaching, but then
> > realized that if I wrote the help pages, I'd essentially be violating
> > the copyright of the book...so that project died. But I do have a
> > collection of R objects for the data sets which I'm editing and hope
> > to finish before the weekend is out. Bob prefers a zipped csv archive,
> > but I can make an R binary available (or a zipped version of .Rdata
> > files) if anyone is interested.
> > 
> > Dennis
> > 
> > On Wed, Jan 30, 2013 at 7:54 PM, Jeff Laux <jefflaux at gmail.com> wrote:
> > > Yes.  They can be found on NC State's Statistics department's website:
> > >
> > >      http://www.stat.ncsu.edu/working_groups/sas/sicl/data/
> > >
> > > However, the accompanying stories don't exist.  What is posted is just tab
> > > delimited text files with numeric data.  Someone else will have to say what
> > > the numbers are supposed to mean.
> > >
> > >
> > >
> > > On 1/30/2013 6:48 PM, Bob wrote:
> > >>
> > >> Just saw a mention of _Handbook of Small Datasets_.  Does anyone know
> > >> if the data files ever got cleaned up and posted on the Internet?  I
> > >> bought this when I came out and the disk included files that seemed to
> > >> be created by cut and paste from the manuscript.  This meant that the
> > >> "shape" of the data matched a typesetter's needs rather than a
> > >> statistician's.  Most of the datasets needed considerable manual work
> > >> before one could hand them off to students.  (I DID find what appeared
> > >> to be the original disfunctional versions online.)  It's really sad
> > >> that a collection that was such a good idea on paper was so poorly
> > >> implemented.
> > >>
> > >>
> > >> ------->  First-time AP Stats. teacher?  Help is on the way! See
> > >> http://courses.ncssm.edu/math/Stat_Inst/Stats2007/Bob%20Hayden/Relief.html
> > >>        _
> > >>       | |          Robert W. Hayden
> > >>       | |          142 Main Street
> > >>      /  |          Apartment 104
> > >>     |   |          Jaffrey, New Hampshire 03452  USA
> > >>     |   |          email: bob@ the site below
> > >>    /    |          website: http://statland.org
> > >>   | x   /          phone: (603) 532-7224 (home)
> > >>   ''''''
> > >>
> > >> _______________________________________________
> > >> R-sig-teaching at r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
> > >>
> > >
> > > _______________________________________________
> > > R-sig-teaching at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
> > 
> > _______________________________________________
> > R-sig-teaching at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
> > 
> > 
> 
> 
> ------->  First-time AP Stats. teacher?  Help is on the way! See
> http://courses.ncssm.edu/math/Stat_Inst/Stats2007/Bob%20Hayden/Relief.html
>       _
>      | |          Robert W. Hayden
>      | |          142 Main Street
>     /  |          Apartment 104
>    |   |          Jaffrey, New Hampshire 03452  USA
>    |   |          email: bob@ the site below
>   /    |          website: http://statland.org
>  | x   /          phone: (603) 532-7224 (home)
>  ''''''
> 
> _______________________________________________
> R-sig-teaching at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
> 
> 


------->  First-time AP Stats. teacher?  Help is on the way! See
http://courses.ncssm.edu/math/Stat_Inst/Stats2007/Bob%20Hayden/Relief.html
      _
     | |          Robert W. Hayden
     | |          142 Main Street
    /  |          Apartment 104
   |   |          Jaffrey, New Hampshire 03452  USA
   |   |          email: bob@ the site below
  /    |          website: http://statland.org
 | x   /          phone: (603) 532-7224 (home)
 ''''''



More information about the R-sig-teaching mailing list