(PR#2453) [Rd] ctest package: wilcox.test() produces integer
ripley@stats.ox.ac.uk
ripley@stats.ox.ac.uk
Tue Jan 14 18:51:05 2003
We've seen the integer overflow problem in ks.test before, easily solved.
The help page says x and y must be numeric, so this is user error. I've
added tests to the code.
Why do people file bug reports without reading the help/man page?
On Tue, 14 Jan 2003 bates@stat.wisc.edu wrote:
> This was filed as a bug report on the Debian r-base package. It is
> more properly a bug report on the ctest package in R.
>
> The default method for wilcox.test manipulates x and y without
> checking the class or data.class of these objects. Possible solutions
> are
> - create wilcox.test.factor (if appropriate)
> - check the class and/or data.class of x and y in wilcox.test.default
> and produce error messages or warnings for inappropriate objects
> - coerce to numeric unconditionally (probably not a good idea)
>
> Martin Michlmayr <tbm@cyrius.com> writes:
>
> > Package: r-base
> > Version: 1.5.0-2 / 1.6.1.cvs.20030103-1
> > Severity: normal
> >
> > I have some ordinal data and I wanted to perform an u-test. However,
> > a problem occured:
> >
> > > x <- read.table("spss-3.txt", header=TRUE)
> > > a = factor(x$a)
> > > b = factor(x$b)
> > > summary(a)
> > 1 2 3 4 5 6
> > 23900 20362 15238 10007 3399 472
> > > summary(b)
> > 1 2 3 4 5 6
> > 23809 20649 15069 9952 3415 484
> > > wilcox.test(a, b)
> >
> > Wilcoxon rank sum test with continuity correction
> >
> > data: a and b
> > W = 5384330884, p-value = NA
> > alternative hypothesis: true mu is not equal to 0
> >
> > Warning messages:
> > 1: "-" not meaningful for factors in: Ops.factor(x, mu)
> > 2: NAs produced by integer overflow in: n.x * n.y
> > 3: NAs produced by integer overflow in: n.x * n.y
> > >
> >
> > Now there appear to be two issues: First of all, the NAs produced by
> > integer overflow. Since they go away when I use less data, this looks
> > like an R bug with big data sets. When I use less data, the warning
> > goes away:
> >
> > 57:tbm@arborlon: ~] wc -l s
> > 40000 s
> >
> > > summary(a)
> > 1 2 3 4 5 6
> > 13034 11086 8341 5412 1869 257
> > > summary(b)
> > 1 2 3 4 5 6
> > 13034 11086 8341 5412 1869 257
> > > wilcox.test(a, b)
> >
> > Wilcoxon rank sum test with continuity correction
> >
> > data: a and b
> > W = 1599920001, p-value = < 2.2e-16
> > alternative hypothesis: true mu is not equal to 0
> >
> > Warning message:
> > "-" not meaningful for factors in: Ops.factor(x, mu)
> > >
> >
> >
> > However, I still don't know what the other warning is. I dont have an
> > "-" in my data. I reduced the data to 2 lines and the problem still
> > occurs:
> >
> > > summary(a)
> > 2 3
> > 1 1
> > > summary(b)
> > 2 3
> > 1 1
> > > wilcox.test(a, b)
> >
> > Wilcoxon rank sum test
> >
> > data: a and b
> > W = 4, p-value = 0.3333
> > alternative hypothesis: true mu is not equal to 0
> >
> > Warning message:
> > "-" not meaningful for factors in: Ops.factor(x, mu)
> > >
> >
> > The file is:
> >
> > 67:tbm@arborlon: ~] cat s
> > a b
> > 2 4
> > 3 1
> > 68:tbm@arborlon: ~]
> >
> >
> > I'm not an R expert, so this might be a pilot error; but I don't see
> > where.
> >
> >
> > -- System Information:
> > Debian Release: 3.0
> > Architecture: i386
> > Kernel: Linux regression 2.4.19-686 #1 Thu Aug 8 21:30:09 EST 2002 i686
> > Locale: LANG=en_US, LC_CTYPE=en_US
> >
> > Versions of packages r-base depends on:
> > ii r-base-core 1.5.0-2 GNU R core of statistical computin
> > ii r-base-html 1.5.0-2 GNU R html docs for statistical co
> > ii r-base-latex 1.5.0-2 GNU R LaTeX docs for statistical c
> >
> > -- no debconf information
> >
> >
> > --
> > Martin Michlmayr
> > tbm@cyrius.com
> >
> >
>
>
--
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595