[R] problem (bug?) with prelim.norm (package norm)
Andreas Wolf
andreas.wolf at uni-jena.de
Thu Feb 24 12:37:09 CET 2005
dear list members,
there seems to be a problem with the prelim.norm function (package norm)
as number of items in the dataset increases.
the output of prelim.norm() is a list with different summary statistics,
one of them is the missingness indicator matrix "r". it lists all
patterns of missing data and a count of how often each pattern occured
in the dataset. as the number of items and number of patterns increases,
it seems to malfunction, as it stops after less than 200 patterns and
the count for the last row/pattern equals the number of subjects minus
the number of patterns listed before.
let's give an example: i generate multivariate normal data for 40
variables and 500 observations. i randomly delete 10 percent of the
values for each person (i.e. set them to NA). as the number of possible
patterns of missings (combinations without repetition: 4 over 40) is
91390, you'd expect to have (almost) as many different patterns of
missings as subjects in the dataset (~ 500). however, running
prelim.norm, the "r" matrix indicates some 170 patterns (it varies in
multiple runs !!), the last pattern to be some 320 times in the dataset
(which is, of course, not true if you check).
any ideas?
INPUT:
x <- matrix(rnorm(20000),500,40) # generate 50 variables with 500
observations
for (tmp in 1:500) {
draw <- sample(1:40, 4, replace=F)
x[tmp, draw] <- NA
} # set (random) 10 percent of values per observation to NA
library(norm)
s <- prelim.norm(x) # run prelim.norm from package norm
s$r # missingness indicator matrix (0-missing, 1-observed)
dimnames(s$r)[[1]][length(s$r[,1])] # count for (supposedly) last
pattern
tmp <- which(s$r[length(s$r[,1]),] == 0) # vector of items
(supposedly) missing in last pattern
which(is.na(x[,tmp[1]]) & is.na(x[,tmp[2]]) & is.na(x[,tmp[3]]) &
is.na(x[,tmp[4]])) # list cases with last pattern
p.s. it works fine up to 30 items ... hence, it's not due to the
absolute number of patterns, as there're almost as many patterns as
subjects with 3 out of 30 items missing (possible patterns: 3 over 30 =
4060)
p.p.s. i first thought of the recursion limit in R, but it doesn't help
( options(expressions = 100000) )
More information about the R-help
mailing list