[R] father and son heights
Gabor Grothendieck
ggrothendieck at myway.com
Sun Feb 15 20:57:10 CET 2004
According to:
http://www.spss.com/research/wilkinson/Publications/galton.pdf
there are actually two father/son height datasets. One was
collected by Galton. Apparently Pearson used that data but
also collected and used a second dataset together with Alice Lee
in roughly the same time frame.
---
Date: Sun, 15 Feb 2004 15:30:43 -0400 (AST)
From: Rolf Turner <rolf at math.unb.ca>
To: <loraine at loraine.net>
Cc: <r-help at stat.math.ethz.ch>
Subject: Re: [R] father and son heights
Ann Loraine wrote:
> I'm looking for Pearson's father and son height data.
> ........... It's a data set that is used to teach Pearson's
> correlation coefficient in a popular statistics textbook - "Statistics"
> by Freedman, Pisani, et al.
>
> It contains over a thousand measurements of son's and their father's
> heights.
>
> I would like to find it in electronic form so that I can use it to
> prepare figures and examples for a lecture.
>
> If anyone knows where I could find it, please let me know. I've done a
> few Google searches but haven't had any luck so far. I also used the
> data() command to look through R's built-in data sets and couldn't find
> it. Any suggestions would be most welcome!
I believe that you have been searching under the wrong name. The
data are most closely associated with Galton (the bloke to whom the
word ``regression'' is due) rather than with Pearson.
A search on
Galton height
led me immediately to
http://wiener.math.csi.cuny.edu/UsingR/Data/galton.html
where the data appear to be readily available.
I ***presume*** that these are the data you seek, although there are
only 930 observations, not ``over a thousand''. (Close, but!)
The data are given to a limited accurracy, which induces a strangely
grid-like appearance when they are plotted, but that is presumably
the nature of this data set. They were apparently taken from a table
prepared by Galton. Values which were originally given in Galton's
table as ``>= 73.7'' or ``<= 61.7'' are truncated to their respective
bounds.
One thing that puzzles me: The documentation says that the data
pertain to 928 children, yet there are 930 data points. (????)
I can't find an explanation in the documentation. Maybe I'm just
blind. Or thick.
cheers,
Rolf Turner
rolf at math.unb.ca
More information about the R-help
mailing list