No, you're quite right, I was wrong. I must have been very confused when I
put up the first 'bug' report, #886. I noticed my error a couple of days
after and put up a comment to that effect on R-bugs, but it seems to have
vanished (or alternatively I did that wrong as well). I agree with you on
the criterion for the choice of districts as well. There remains a minuscule
discrepancy between the Mosteller and Tukey description and the Princeton
one, in that M&T say that the 4th variable is the % of the population with
education beyond primary school, whereas the Princeton source says it's the
percentage of 'draftees' with this level of education; given the date, the
differences between male and female education levels at the time, and the
(presumed) fact that the 'draftees' are all male, this might make a
difference to interpretation, I suppose.

It did occur to me to wonder why Mosteller and Tukey chose these particular
variables out of all those given in the source...


Kevin McConway
Department of Statistics
The Open University
> Hardly crucial, but I've come upon a potential error in the documentation
> of the 'swiss' datafram in the R base package. The description accurately
> matches what is said in the Mosteller and Tukey source quoted, but
> according to the data archived at Princeton (links from
> http://opr.princeton.edu/archive/eufert/switz.html), the variable that
> Mosteller and Tukey report as infant mortality is actually the proportion
> of 'draftees' with education beyond primary school. Infant mortality is on
> the archived file, but the values are quite a lot different. Of course,
> it's possible that Mosteller and Tukey were right and the people who did
> the archiving at Princenton, later, got it wrong.
> It seems it is you that got it wrong! In the file sw1888.dat in their
> switz.zip, the variable
>  36 USCHOOL        321-330       3  Prop. draftees with > primary educ.
> corresponds to `Education' and
>  22 INFMORT        181-190       3  Infant Mortality Rate
> corresponds to `Infant Mortality'.
> Apart from guessing factors of 10, the data correspond to the 47 districts
> with > 50% French speakers to the accuracy they are given.
> If you still disagree, can you please explain how you did this?  I used
> > read.fwf("sw1888.dat", widths=c(rep(10, 45),1,2,2,2,8,1,15)) ->sw0
> > ind <- sw0[, 27] > 50000
> > sw <- sw0[ind, ]
> > sw[, c(5, 12, 34, 36, 9, 22)-3]
> the first column in the file being variable 4.
> Thanks for the source: as from 1.3.0 I have added the district names.
