[R] OT: irises
Thomas Lumley
tlumley at u.washington.edu
Mon Oct 13 01:02:28 CEST 2008
Attention conservation notice: a digression on Fisher's iris data, related
only tangentially to R.
The package announcement for hwriter points to a webpage created with the
package, http://www.ebi.ac.uk/~gpau/hwriter/
based on the Fisher/Anderson iris data, including pictures.
Unfortunately, the pictures are not of the right species (two appear to be
tall bearded iris cultivars, the third probably either Iris ensata or Iris
siberica). Pictures of the right species would be very useful -- Iris
setosa really is visibly different in structure (not just in color), not
having visible upright `standards'. There are nice pictures at the Iris
Species Database: http://www.badbear.com/signa/signa.pl?Introduction
Looking for pictures I noticed that the terminology seems to have changed
since Anderson's time: most online references that distinguish between
petals and sepals for the iris will describe the standards as petals and
the falls (hanging-down bits) as sepals, so that I. setosa has very short
petals, not sepals. (eg the US Forest Service at
http://www.fs.fed.us/wildflowers/beauty/iris/flowers.shtml)
The other historical anomaly is that many descriptions of the data
are as if Fisher was interested in whether I. versicolor and I.virginica
can be separated by linear discrimination. In fact, the hypothesis was
that I. versicolor was between the other two species and twice as close to
I. virginica as I. setosa. Iris virginica has twice as many chromosomes
as I. setosa, and I. versicolor has as many as both of them put together,
so the theory was that I. versicolor would have 4 virginica and 2 setosa
alleles at each locus. [RA Fisher digital
archive at University of Adelaide, http://hdl.handle.net/2440/15227].
This is a nice example of a null hypothesis value that is not zero.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list