[Rd] ctest package: wilcox.test() produces integer overflow (PR#2453)
bates@stat.wisc.edu
bates@stat.wisc.edu
Tue Jan 14 18:11:05 2003
This was filed as a bug report on the Debian r-base package. It is
more properly a bug report on the ctest package in R.
The default method for wilcox.test manipulates x and y without
checking the class or data.class of these objects. Possible solutions
are
- create wilcox.test.factor (if appropriate)
- check the class and/or data.class of x and y in wilcox.test.default
and produce error messages or warnings for inappropriate objects
- coerce to numeric unconditionally (probably not a good idea)
Martin Michlmayr <tbm@cyrius.com> writes:
> Package: r-base
> Version: 1.5.0-2 / 1.6.1.cvs.20030103-1
> Severity: normal
>
> I have some ordinal data and I wanted to perform an u-test. However,
> a problem occured:
>
> > x <- read.table("spss-3.txt", header=TRUE)
> > a = factor(x$a)
> > b = factor(x$b)
> > summary(a)
> 1 2 3 4 5 6
> 23900 20362 15238 10007 3399 472
> > summary(b)
> 1 2 3 4 5 6
> 23809 20649 15069 9952 3415 484
> > wilcox.test(a, b)
>
> Wilcoxon rank sum test with continuity correction
>
> data: a and b
> W = 5384330884, p-value = NA
> alternative hypothesis: true mu is not equal to 0
>
> Warning messages:
> 1: "-" not meaningful for factors in: Ops.factor(x, mu)
> 2: NAs produced by integer overflow in: n.x * n.y
> 3: NAs produced by integer overflow in: n.x * n.y
> >
>
> Now there appear to be two issues: First of all, the NAs produced by
> integer overflow. Since they go away when I use less data, this looks
> like an R bug with big data sets. When I use less data, the warning
> goes away:
>
> 57:tbm@arborlon: ~] wc -l s
> 40000 s
>
> > summary(a)
> 1 2 3 4 5 6
> 13034 11086 8341 5412 1869 257
> > summary(b)
> 1 2 3 4 5 6
> 13034 11086 8341 5412 1869 257
> > wilcox.test(a, b)
>
> Wilcoxon rank sum test with continuity correction
>
> data: a and b
> W = 1599920001, p-value = < 2.2e-16
> alternative hypothesis: true mu is not equal to 0
>
> Warning message:
> "-" not meaningful for factors in: Ops.factor(x, mu)
> >
>
>
> However, I still don't know what the other warning is. I dont have an
> "-" in my data. I reduced the data to 2 lines and the problem still
> occurs:
>
> > summary(a)
> 2 3
> 1 1
> > summary(b)
> 2 3
> 1 1
> > wilcox.test(a, b)
>
> Wilcoxon rank sum test
>
> data: a and b
> W = 4, p-value = 0.3333
> alternative hypothesis: true mu is not equal to 0
>
> Warning message:
> "-" not meaningful for factors in: Ops.factor(x, mu)
> >
>
> The file is:
>
> 67:tbm@arborlon: ~] cat s
> a b
> 2 4
> 3 1
> 68:tbm@arborlon: ~]
>
>
> I'm not an R expert, so this might be a pilot error; but I don't see
> where.
>
>
> -- System Information:
> Debian Release: 3.0
> Architecture: i386
> Kernel: Linux regression 2.4.19-686 #1 Thu Aug 8 21:30:09 EST 2002 i686
> Locale: LANG=en_US, LC_CTYPE=en_US
>
> Versions of packages r-base depends on:
> ii r-base-core 1.5.0-2 GNU R core of statistical computin
> ii r-base-html 1.5.0-2 GNU R html docs for statistical co
> ii r-base-latex 1.5.0-2 GNU R LaTeX docs for statistical c
>
> -- no debconf information
>
>
> --
> Martin Michlmayr
> tbm@cyrius.com
>
>
--
Douglas Bates bates@stat.wisc.edu
Statistics Department 608/262-2598
University of Wisconsin - Madison http://www.stat.wisc.edu/~bates/