Original is http://stork.ukc.ac.uk/IMS/statistics/people/J.M.Bremner/anscombe.html

Modifications of the Anscombe regression datasets

Anscombe (1973) presented four data sets in a paper which advocated the more extensive use of statistical graphics in an era when graphical tools were not generally available without a degree of programming effort. The data were values of an explanatory variable and a response variable, and for each data set fitting a simple linear regression model led (to a close approximation) to the same results. Scatter plots of the data, however, were very different in appearance.

Some years ago I produced four data sets, based on the Anscombe data but with the property that fitting a simple linear regression model gave exactly the same results: these data sets are referred to as Sets A, B, C and E below. I also added a fifth data set, Set D. See Bassett et al. (1986) for descriptions of these data sets. (My idea was that, if members of a class were given different data sets selected from Sets A-D to work on, they would probably not notice that they had different data sets if they just carried out the standard regression calculations.)

Anscombe's data

[Sets 1-4]

My data

[Sets A-D] [Set A] [Set B] [Set C] [Set D] [Set E]

References

Anscombe, F. J. (1973) Graphs in statistical analysis. The American Statistician, 27, 17-21.

Bassett, E. E., Bremner, J. M., Jolliffe, I.T., Jones, B., Morgan, B. J. T., and North, P. M. (1986) Statistics: Problems and Solutions. London: Edward Arnold.


Links to home pages: [Personal] [Statistics] [IMS] [UKC]
Mike Bremner (J.M.Bremner@ukc.ac.uk, jmb@ukc.ac.uk)
15 January 1997