[R] Possible bug in apply()
Clive Jenkins
clive.jenkins at clara.net
Sat Oct 7 18:12:38 CEST 2000
In the course of applying Shapiro-Wilk to 100,000 samples of 60 items
from 100,000 different distributions, I encountered a fatal error in
apply(). This can be reconstructed as follows, using the attached data
file distr.dat containing 2 lines of my original 100,000-line file:
> version
_
platform Windows
arch x86
os Win32
system x86, Win32
status
major 1
minor 1.1
year 2000
month August
day 15
language R
> # Read the data in
> distr.dat <- matrix(scan("distr.dat"), byrow=T, ncol=60)
Read 120 items
> # Define a function to perform Shapiro-Wilk, with protection
> # against values that would cause fatal errors in shapiro.test()
> shap <- function(x){
+ result.W <- NA; result.p <- NA # Set default return value
+ if (length(x[!is.na(x)])>3){ # Check for bad values
+ x.var <- var(x, na.rm=T)
+ if (!is.na(x.var)){
+ if(x.var>0){ # If values OK, perform test
+ shap.res <- shapiro.test(x)
+ result.W <- shap.res$statistic # Problem line
+ result.p <- shap.res$p.value
+ }
+ }
+ }
+ c(result.W, result.p)
+ }
> apply(dist.dat, 1, shap)
Error in names(x) == ans.names : comparison (1) is possible only for
vector types
>
If we look at the structure of the value returned by shap() for the
first sample, we see that W has a "names" attribute whereas p has
not (or its "names" attribute is the empty string):
> str(shap(dist.dat[1,]))
Named num [1:2] 0.887519 0.000622
- attr(*, "names")= chr [1:2] "W" ""
>
but in the case of the second sample (all NA except one), shapiro.test()
was not called, and the preset default value c(NA, NA) returned:
> str(shap(dist.dat[2,]))
logi [1:2] NA NA
>
Looking directly at the result of shapiro.test(), we get:
> str(shapiro.test(dist.dat[1,]))
List of 4
$ statistic: Named num 0.888
..- attr(*, "names")= chr "W"
$ p.value : num 0.000622
$ method : chr "Shapiro-Wilk normality test"
$ data.name: chr "dist.dat[1, ]"
- attr(*, "class")= chr "htest"
>
So shapiro.test()$statistic has a "names" attribute, whereas
shapiro.test()$p.value does not, and my default return value c(NA, NA)
does not. If I remove the "names" attribute from W by changing the line
of code marked "# Problem line" to
result.W <- as.numeric(shap.res$statistic)
the error disappears. Reading the help for apply() I could find no
reference to "names" attributes, let alone any restrictions on them.
It appears to me that there is a bug in apply(), in that it cannot deal
gracefully with this somewhat unusual situation. However some blame may
be attributable to me or to shapiro.test(), so I leave it for the
R-gurus to look at and to forward to R-bugs if appropriate.
Clive Jenkins.
-------------- next part --------------
NA 0.000632 NA 0.009640 NA 0.000632 NA -0.001176 0.004235 NA 0.004235 0.002418 0.011395 0.002433 0.004235 -0.001170 -0.001170 0.000632 0.002433 0.000623 0.000632 -0.001170 0.006037 0.004235 0.004235 0.002418 NA NA NA 0.002418 NA 0.009640 0.000623 NA NA 0.004213 NA -0.001170 0.009600 0.006037 -0.001170 0.006009 -0.001173 0.002433 0.004235 NA NA NA NA NA 0.009640 0.002433 0.004235 NA 0.002433 -0.001170 0.004235 -0.001173 -0.001170 -0.001170
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0.015956 NA NA NA NA
More information about the R-help
mailing list