[R] missing values

Jonathan Baron baron at psych.upenn.edu
Tue Apr 26 17:12:17 CEST 2005


On 04/26/05 12:54, Ted Harding wrote:
 Would you be kind enough to give sufficient detail to reproduce
 such a case? I've used 'norm' (and 'cat' and 'mix') quite
 extensively, without encountering non-sensible results (at any
 rate in situations where the packages were not being abused,
 which one can do in certain circumstances -- imputing missing
 values can depend quite strongly on supplying realistic constraints,
 and on not expecting too much when the proportion of missing data
 is substantial: this methodology does not have magical powers!).

OK.  Here you go.  First the data without any names:

41,43,41,43,44
43,40,40,42,41
43,44,NA,43,44
42,43,NA,44,44
41,44,42,42,42
43,43,41,42,42
47,48,46,47,46
39,35,35,39,38
40,39,36,40,38
40,40,40,40,40
48,46,46,48,46
45,45,42,44,45
41,40,40,41,41
40,39,37,40,38
41,42,40,41,41
41,42,41,43,43
46,46,45,46,46
40,40,41,40,41
39,41,40,41,41
40,43,38,40,39
37,36,37,36,39
45,46,45,46,46
43,44,42,43,44
42,42,48,42,43
45,46,45,46,45
37,36,36,36,38
37,34,39,37,39
NA,43,41,44,43
45,44,45,44,45
38,38,37,39,38
45,44,44,44,45
NA,42,43,43,43
45,45,44,44,45
40,35,37,40,38
43,43,43,43,43
39,34,37,36,39
38,38,38,39,39
43,41,40,42,43
46,43,42,45,45
46,45,41,44,44
40,40,38,39,40
39,37,39,38,39

Now the commands I used in norm, and the result:

m1 <- as.matrix(read.csv("test.data"))
s1 <- prelim.norm(m1)
thetahat <- em.norm(s1)
rngseed(1234564)
ximp <- imp.norm(s1,thetahat,m1)
ximp

1  41.00000 43 41.00000 43 44
2  43.00000 40 40.00000 42 41
3  43.00000 44 43.72409 43 44
4  42.00000 43 43.36864 44 44
5  41.00000 44 42.00000 42 42
6  43.00000 43 41.00000 42 42
7  47.00000 48 46.00000 47 46
8  39.00000 35 35.00000 39 38
9  40.00000 39 36.00000 40 38
10 40.00000 40 40.00000 40 40
11 48.00000 46 46.00000 48 46
12 45.00000 45 42.00000 44 45
13 41.00000 40 40.00000 41 41
14 40.00000 39 37.00000 40 38
15 41.00000 42 40.00000 41 41
16 41.00000 42 41.00000 43 43
17 46.00000 46 45.00000 46 46
18 40.00000 40 41.00000 40 41
19 39.00000 41 40.00000 41 41
20 40.00000 43 38.00000 40 39
21 37.00000 36 37.00000 36 39
22 45.00000 46 45.00000 46 46
23 43.00000 44 42.00000 43 44
24 42.00000 42 48.00000 42 43
25 45.00000 46 45.00000 46 45
26 37.00000 36 36.00000 36 38
27 37.00000 34 39.00000 37 39
28 44.13337 43 41.00000 44 43
29 45.00000 44 45.00000 44 45
30 38.00000 38 37.00000 39 38
31 45.00000 44 44.00000 44 45
32 41.25152 42 43.00000 43 43
33 45.00000 45 44.00000 44 45
34 40.00000 35 37.00000 40 38
35 43.00000 43 43.00000 43 43
36 39.00000 34 37.00000 36 39
37 38.00000 38 38.00000 39 39
38 43.00000 41 40.00000 42 43
39 46.00000 43 42.00000 45 45
40 46.00000 45 41.00000 44 44
41 40.00000 40 38.00000 39 40
42 39.00000 37 39.00000 38 39

What seemed odd to me, and maybe they aren't, were the imputed
values in rows 3 and 4.  They seemed high, knowing the rater in
question and the students.  Here is the output of transcan, for
the same cases, which looks more in line with what I expected:

1  41.00000 43 41.00000 43 44
2  43.00000 40 40.00000 42 41
3  43.00000 44 43.09469 43 44
4  42.00000 43 43.39897 44 44
5  41.00000 44 42.00000 42 42
6  43.00000 43 41.00000 42 42
7  47.00000 48 46.00000 47 46
8  39.00000 35 35.00000 39 38
9  40.00000 39 36.00000 40 38
10 40.00000 40 40.00000 40 40
11 48.00000 46 46.00000 48 46
12 45.00000 45 42.00000 44 45
13 41.00000 40 40.00000 41 41
14 40.00000 39 37.00000 40 38
15 41.00000 42 40.00000 41 41
16 41.00000 42 41.00000 43 43
17 46.00000 46 45.00000 46 46
18 40.00000 40 41.00000 40 41
19 39.00000 41 40.00000 41 41
20 40.00000 43 38.00000 40 39
21 37.00000 36 37.00000 36 39
22 45.00000 46 45.00000 46 46
23 43.00000 44 42.00000 43 44
24 42.00000 42 48.00000 42 43
25 45.00000 46 45.00000 46 45
26 37.00000 36 36.00000 36 38
27 37.00000 34 39.00000 37 39
28 43.80165 43 41.00000 44 43
29 45.00000 44 45.00000 44 45
30 38.00000 38 37.00000 39 38
31 45.00000 44 44.00000 44 45
32 42.91116 42 43.00000 43 43
33 45.00000 45 44.00000 44 45
34 40.00000 35 37.00000 40 38
35 43.00000 43 43.00000 43 43
36 39.00000 34 37.00000 36 39
37 38.00000 38 38.00000 39 39
38 43.00000 41 40.00000 42 43
39 46.00000 43 42.00000 45 45
40 46.00000 45 41.00000 44 44
41 40.00000 40 38.00000 39 40
42 39.00000 37 39.00000 38 39

The commands here were

s.imp <- transcan(m1,asis="*",data=m1,imputed=T,long=T,pl=F)
s.na <- is.na(m1) # which ratings are imputed
m1[which(s.na)] <- unlist(s.imp$imputed)

(I wish I could find a more elegant way to replace the NAs.)

Jon
- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron




More information about the R-help mailing list