[R] father and son heights
Rolf Turner
rolf at math.unb.ca
Sun Feb 15 20:30:43 CET 2004
Ann Loraine wrote:
> I'm looking for Pearson's father and son height data.
> ........... It's a data set that is used to teach Pearson's
> correlation coefficient in a popular statistics textbook - "Statistics"
> by Freedman, Pisani, et al.
>
> It contains over a thousand measurements of son's and their father's
> heights.
>
> I would like to find it in electronic form so that I can use it to
> prepare figures and examples for a lecture.
>
> If anyone knows where I could find it, please let me know. I've done a
> few Google searches but haven't had any luck so far. I also used the
> data() command to look through R's built-in data sets and couldn't find
> it. Any suggestions would be most welcome!
I believe that you have been searching under the wrong name. The
data are most closely associated with Galton (the bloke to whom the
word ``regression'' is due) rather than with Pearson.
A search on
Galton height
led me immediately to
http://wiener.math.csi.cuny.edu/UsingR/Data/galton.html
where the data appear to be readily available.
I ***presume*** that these are the data you seek, although there are
only 930 observations, not ``over a thousand''. (Close, but!)
The data are given to a limited accurracy, which induces a strangely
grid-like appearance when they are plotted, but that is presumably
the nature of this data set. They were apparently taken from a table
prepared by Galton. Values which were originally given in Galton's
table as ``>= 73.7'' or ``<= 61.7'' are truncated to their respective
bounds.
One thing that puzzles me: The documentation says that the data
pertain to 928 children, yet there are 930 data points. (????)
I can't find an explanation in the documentation. Maybe I'm just
blind. Or thick.
cheers,
Rolf Turner
rolf at math.unb.ca
More information about the R-help
mailing list