[R-sig-teaching] Handbook of Small Datasets
Bob
bob at statland.org
Thu Jan 31 13:57:49 CET 2013
The datsets are also available on the publisher's website. I just
compared several files among those, the versions at NCSU, and the
version that came on a floppy with the book when it first came out.
File sizes are exactly the same and for the files I looked inside the
layout is sheer madness in all three versions.
Forwarded message:
>
>
> I just looked at a couple of files at NCSU and they looked like the
> original files on the disk supplied with the book. Has anyone found
> ones there that are NOT as originally supplied? Here is an example
> (made up) of the kind of problems in the original files. You might
> have a data set with three variables, x and y quantitative, and z
> categorical with three groups. The data file looks just like the
> table in the book: six columns, x,y,x,y,x,y with the pairs matching
> z=a,b,c. So the given data have to be stacked and the categorical
> variable created. Not too horrendous if you just want to use that one
> datsaset but virtually all the files have similar (often worse)
> problems, i.e., you cannot read them into an R dataframe as-is. You
> can find other actual examples in my review
>
> Review of Two Collections of Data for Use in a First Course in
> Statistics, The American Statistician, Vol.50, No.2 (May 1996),
> pp. 168-169.
>
> I cleaned up and used about 20 datasets myself. At the time I wrote
> the review I had fantasies of finding 25 others who would each
> volunteer to clean another 20 each. I had long given up on that when
> Dennis surprised me by offering far more than 20. So I will be
> working with him and will also look at the NCSU versions again. I have
> also had others volunteer bits and pieces so I hope we will soon have
> all or most in useable form.
>
> PS
>
> While R gurus may feel the problem is minor FOR THEM, I had hoped
> to use the book in the following way. After teaching topic X in a
> gen. ed. intro. course, ask students to pick a data set of interest to
> them and analyze it using X. Beginners will not even notice the data
> are not in standard format, and will spend lots of time wrestling with
> the software to get usable output rather than focussing on the
> statistical concepts. I work a lot with high school teachers of
> AP Statistics who themselves have usually taken 1 plus/minus 1
> statistics courses and have NO experience with real data. I would
> LOVE to recommend this book to them but they would circle my home and
> burn it to the ground after a few hours wrestling with the form the
> data are in now.
>
> Forwarded message:
> > From r-sig-teaching-bounces at r-project.org Thu Jan 31 05:01:43 2013
> > X-Original-To: bob at statland.org
> > X-Csoft-Rule: spam<=5.0
> > X-Spam-Checker-Version: SpamAssassin 3.3.2-csoft38 (2011-06-06) on
> > ubar.csoft.net
> > X-Spam-Level:
> > X-Spam-Status: No, score=-6.8 required=6.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED,
> > DKIM_SIGNED,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham
> > version=3.3.2-csoft38
> > DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
> > h=mime-version:x-received:in-reply-to:references:date:message-id
> > :subject:from:to:cc:content-type;
> > bh=qO6E7XG6EOx1anYr8Vdu9ic3wJ3HlmmVmsDdeDp8lS8=;
> > b=Siev/NyEGz+FX4xqHlcgysciCyl9YjsiaYSwPNHuemdWdz5Zl3OuZ08lADtFz5HGrD
> > 8y5r8YLyym+jdkq6zQAEMK7Bfb227z+iwCxfc3qru720Ql1gdodEHShfDuOJtgAN/ujQ
> > LzF8tI2wvWRmrqAih3Pjh+BhbEEnF3SWVaXkzFl4P1t0QxFIWiJeC9QDalCHyDovO0RA
> > Ks8pKKNJb/dJOerBXhYD4aihb9xuBNBX7UVo2o6Eg/wxBvKLtbuU9XqBQz8074vsDa4U
> > UNbxGiXVGa0DpdZKoFlRPgNgkszzfzt12G3e9TGZiAxg4AoPDUWYpqdK24u7UURPsNSn
> > 85hQ==
> > MIME-Version: 1.0
> > X-Received: by 10.42.58.67 with SMTP id g3mr6504459ich.56.1359626335114; Thu,
> > 31 Jan 2013 01:58:55 -0800 (PST)
> > In-Reply-To: <5109EADE.7080205 at gmail.com>
> > References: <20130130234801.9BAF471989D5 at mail89.csoft.net>
> > <5109EADE.7080205 at gmail.com>
> > Date: Thu, 31 Jan 2013 01:58:54 -0800
> > Message-ID: <CADv2QyGSqGVrJSET-Qah2AdagLWiOpy3h5=r_OWKeSPu-Ft6_A at mail.gmail.com>
> > From: Dennis Murphy <djmuser at gmail.com>
> > To: Jeff Laux <jefflaux at gmail.com>
> > X-Tag-Only: YES
> > X-Filter-Node: phil2.ethz.ch
> > X-USF-Spam-Level:
> > X-USF-Spam-Status: hits=-0.7 tests=FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS,
> > T_DKIM_INVALID
> > X-USF-Spam-Flag: NO
> > X-Virus-Scanned: by amavisd-new at stat.math.ethz.ch
> > Cc: r-sig-teaching at r-project.org
> > Subject: Re: [R-sig-teaching] Handbook of Small Datasets
> > X-BeenThere: r-sig-teaching at r-project.org
> > X-Mailman-Version: 2.1.14
> > Precedence: list
> > List-Id: SIG on Teaching Statistics using R <r-sig-teaching.r-project.org>
> > List-Unsubscribe: <https://stat.ethz.ch/mailman/options/r-sig-teaching>,
> > <mailto:r-sig-teaching-request at r-project.org?subject=unsubscribe>
> > List-Archive: <https://stat.ethz.ch/pipermail/r-sig-teaching>
> > List-Post: <mailto:r-sig-teaching at r-project.org>
> > List-Help: <mailto:r-sig-teaching-request at r-project.org?subject=help>
> > List-Subscribe: <https://stat.ethz.ch/mailman/listinfo/r-sig-teaching>,
> > <mailto:r-sig-teaching-request at r-project.org?subject=subscribe>
> > Content-Type: text/plain; charset="us-ascii"
> > Content-Transfer-Encoding: 7bit
> > Errors-To: r-sig-teaching-bounces at r-project.org
> > Sender: r-sig-teaching-bounces at r-project.org
> >
> > That's what the book is for: its purpose is to describe the variables
> > and context of each data set. The book 'Data' by Andrews and Herzberg
> > (1985) is similar in that respect. As I mentioned to Bob privately, I
> > thought about making a R package of the data sets in HDLMO several
> > years ago because I used a number of them in teaching, but then
> > realized that if I wrote the help pages, I'd essentially be violating
> > the copyright of the book...so that project died. But I do have a
> > collection of R objects for the data sets which I'm editing and hope
> > to finish before the weekend is out. Bob prefers a zipped csv archive,
> > but I can make an R binary available (or a zipped version of .Rdata
> > files) if anyone is interested.
> >
> > Dennis
> >
> > On Wed, Jan 30, 2013 at 7:54 PM, Jeff Laux <jefflaux at gmail.com> wrote:
> > > Yes. They can be found on NC State's Statistics department's website:
> > >
> > > http://www.stat.ncsu.edu/working_groups/sas/sicl/data/
> > >
> > > However, the accompanying stories don't exist. What is posted is just tab
> > > delimited text files with numeric data. Someone else will have to say what
> > > the numbers are supposed to mean.
> > >
> > >
> > >
> > > On 1/30/2013 6:48 PM, Bob wrote:
> > >>
> > >> Just saw a mention of _Handbook of Small Datasets_. Does anyone know
> > >> if the data files ever got cleaned up and posted on the Internet? I
> > >> bought this when I came out and the disk included files that seemed to
> > >> be created by cut and paste from the manuscript. This meant that the
> > >> "shape" of the data matched a typesetter's needs rather than a
> > >> statistician's. Most of the datasets needed considerable manual work
> > >> before one could hand them off to students. (I DID find what appeared
> > >> to be the original disfunctional versions online.) It's really sad
> > >> that a collection that was such a good idea on paper was so poorly
> > >> implemented.
> > >>
> > >>
> > >> -------> First-time AP Stats. teacher? Help is on the way! See
> > >> http://courses.ncssm.edu/math/Stat_Inst/Stats2007/Bob%20Hayden/Relief.html
> > >> _
> > >> | | Robert W. Hayden
> > >> | | 142 Main Street
> > >> / | Apartment 104
> > >> | | Jaffrey, New Hampshire 03452 USA
> > >> | | email: bob@ the site below
> > >> / | website: http://statland.org
> > >> | x / phone: (603) 532-7224 (home)
> > >> ''''''
> > >>
> > >> _______________________________________________
> > >> R-sig-teaching at r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
> > >>
> > >
> > > _______________________________________________
> > > R-sig-teaching at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
> >
> > _______________________________________________
> > R-sig-teaching at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
> >
> >
>
>
> -------> First-time AP Stats. teacher? Help is on the way! See
> http://courses.ncssm.edu/math/Stat_Inst/Stats2007/Bob%20Hayden/Relief.html
> _
> | | Robert W. Hayden
> | | 142 Main Street
> / | Apartment 104
> | | Jaffrey, New Hampshire 03452 USA
> | | email: bob@ the site below
> / | website: http://statland.org
> | x / phone: (603) 532-7224 (home)
> ''''''
>
> _______________________________________________
> R-sig-teaching at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
>
>
-------> First-time AP Stats. teacher? Help is on the way! See
http://courses.ncssm.edu/math/Stat_Inst/Stats2007/Bob%20Hayden/Relief.html
_
| | Robert W. Hayden
| | 142 Main Street
/ | Apartment 104
| | Jaffrey, New Hampshire 03452 USA
| | email: bob@ the site below
/ | website: http://statland.org
| x / phone: (603) 532-7224 (home)
''''''
More information about the R-sig-teaching
mailing list