[R] problem (bug?) with prelim.norm (package norm)

Uwe Ligges ligges at statistik.uni-dortmund.de
Thu Feb 24 21:49:44 CET 2005


Please report bugs in contributed R packages to the packages' 
maintainers, in this case Alvaro A. Novo (in CC).

Uwe Ligges


Andreas Wolf wrote:

> dear list members,
> there seems to be a problem with the prelim.norm function (package norm)
> as number of items in the dataset increases.
> 
> the output of prelim.norm() is a list with different summary statistics,
> one of them is the missingness indicator matrix "r". it lists all
> patterns of missing data and a count of how often each pattern occured
> in the dataset. as the number of items and number of patterns increases,
> it seems to malfunction, as it stops after less than 200 patterns and
> the count for the last row/pattern equals the number of subjects minus
> the number of patterns listed before.
> 
> let's give an example: i generate multivariate normal data for 40
> variables and 500 observations. i randomly delete 10 percent of the
> values for each person (i.e. set them to NA). as the number of possible
> patterns of missings (combinations without repetition: 4 over 40) is
> 91390, you'd expect to have (almost) as many different patterns of
> missings as subjects in the dataset (~ 500). however, running
> prelim.norm, the "r" matrix indicates some 170 patterns (it varies in
> multiple runs !!), the last pattern to be some 320 times in the dataset
> (which is, of course, not true if you check).
> 
> any ideas? 
> 
> 
> INPUT:
> x <- matrix(rnorm(20000),500,40)   # generate 50 variables with 500
> observations
> 
> for (tmp in 1:500) {
>   draw <- sample(1:40, 4, replace=F)
>   x[tmp, draw] <- NA
> }   # set (random) 10 percent of values per observation to NA
> 
> library(norm)
> s <- prelim.norm(x)   # run prelim.norm from package norm
> s$r   # missingness indicator matrix (0-missing, 1-observed)
> dimnames(s$r)[[1]][length(s$r[,1])]   # count for (supposedly) last
> pattern
> 
> tmp <- which(s$r[length(s$r[,1]),] == 0)   # vector of items
> (supposedly) missing in last pattern
> which(is.na(x[,tmp[1]]) & is.na(x[,tmp[2]]) & is.na(x[,tmp[3]]) &
> is.na(x[,tmp[4]]))   # list cases with last pattern
> 
> 
> 
> 
> p.s. it works fine up to 30 items ... hence, it's not due to the
> absolute number of patterns, as there're almost as many patterns as
> subjects with 3 out of 30 items missing (possible patterns: 3 over 30 =
> 4060)
> 
> p.p.s. i first thought of the recursion limit in R, but it doesn't help
> ( options(expressions = 100000) )
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




More information about the R-help mailing list