[R] patterns of missing data: determining monotonicity
Michael Friendly
friendly at yorku.ca
Thu Jan 6 17:50:00 CET 2005
Here is a problem that perhaps someone out here has an idea about. It
vaguely reminds me of something
I've seen before, but can't place. Can anyone help?
For multiple imputation, there are simpler methods available if the
patterns of missing data are 'monotone' ---
if Vj is missing then all variables Vk, k>j are also missing, vs. more
complex methods required when the patterns are not monotone. The
problem is to determine if, for a collection of variables, there is an
ordering of them with a monotone
missing data pattern, or, if not, what the longest monotone sequence is.
Here is an example, where in a dataset of 65 observations, there are 8
different patterns of missingness,
with X and . representing observed and missing:
Group V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 nmiss
1 X X X X X X X X X X 0
2 X X X X X X . X X X 1
3 X X X X X . X X X X 1
4 X X X X X . . X X X 2
5 X X . X . X X X X X 2
6 X X . . X X X X X X 2
7 X X . . X . X X X X 3
8 X X . . . X X X X X 3
Treated as a binary matrix, one can sort the columns by the number
of non-missing for each variable, and monotone means that there
are at most 2 runs -- a string of 0s followed by all 1s for *all*
patterns. But how
to determine an ordering (or orderings) of variables of maximal length?
Group V2 V3 V9 V10 V11 V6 V8 V5 V7 V4 nmiss
1 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 1 0 0 0 1
3 0 0 0 0 0 0 0 0 1 0 1
4 0 0 0 0 0 0 1 0 1 0 2
5 0 0 0 0 0 1 0 0 0 1 2
6 0 0 0 0 0 0 0 1 0 1 2
7 0 0 0 0 0 0 0 1 1 1 3
8 0 0 0 0 0 1 0 1 0 1 3
== == == === === == == == == ==
0 0 0 0 0 2 2 3 3 4
--
Michael Friendly Email: friendly at yorku.ca
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT M3J 1P3 CANADA
More information about the R-help
mailing list