[R] strategy for doing an ANOVA on unbalanced data

Steven Lacey slacey at umich.edu
Thu Dec 14 03:49:47 CET 2006


Hi, 

I would like some help deciding if and how to average my data before running
an ANOVA. Let me first describe the data and what makes it unique.
Hopefully, that will generate some ideas because I am not sure what kind of
model I need to use. I don't know how to describe this succinctly, so please
bear with me. This is my last analysis for my dissertation and I really
could use some help.

I have 3 factors:
	
Mapping: 5 levels, between-subjects 
Finger: 4 levels, within-subjects
Subject: 40 subjects, 8 nested within each level of mapping

Mapping is crossed with finger. So, I have 160 observations; 5 mappings x 4
fingers x 8 subjects. However, these observations are not all independent as
there are only 40 subjects. The 4 observation per subject are not
replicates, as they are observed under different conditions. For each of
these 160 "experimental units" I have multiple dependent variables. 

I take one of these dependent variables, nd, and perform a cluster analysis
nested within the 20 levels of mapping x finger. That is, I take the 8
values for each combination of mapping and finger and cluster them. Each
time I request a 3 cluster solution. This adds a fourth factor to the
analysis, cluster. 

I want to know if some other dependent variable, ti, which is not involved
in the clustering, differs by cluster. How do I test this? 

I quickly run into a couple of problems.
1) The observations at each level of cluster are neither entirely
independent nor entirely dependent. How do I handle this?
2) The mean of each cluster is "incorrect". The mean is biased by the number
of experimental units at each level of mapping x finger that contribute to
that cluster. 

				Subject
Mapping 	finger 	1   2   3   4   5   6   7   8
Color		index       2   3   1   1   1   2   1   2 (cluster)

				Subject
Mapping 	finger 	9   10  11  12  13  14  15  16
Shape		middle      2   2   2   2   3   2   1   2 (cluster)

For instance, the values entered into cluster 1, for instance, are biased by
the fact that 4 of the experimental units are from the color-index cell and
1 is from the shape-middle cell. Cluster membership is not balanced with
respect to levels of mapping x finger. I would like the mean for each
cluster to be "unweighted" with respect to levels of mapping x finger. That
is, I would like the modeled mean for cluster 1 to be the mean of 20 means,
one from each level of mapping x finger for this cluster. How would I
achieve this?

My solution has been to average over the experimental units at each
combination of mapping, finger, and cluster. This yields 60 averaged
observations. The data look like the following:
	
			Cluster
Mapping-finger	1	2	3
Color-index
Color-middle
Color-ring
Color-little
Shape-index
. . .

I then do a repeated-measures ANOVA, treating cluster as a within-subjects
factor and Mapping-finger factor (20 levels) as a random effect. This seems
reasonable because the observations are matched by the mapping-finger
factor. The effect of cluster is significant. But, is this analysis
legitimate? What happened to the variability between the unaveraged
experimental units? Is that indirectly represented in the model? If so,
where and how? Can I talk about the effect of cluster as affecting
individual subjects even though the random factor is mapping-finger not
subjects?

Error: rep(gl(20, 1), 3)
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 19  78596    4137               

Error: rep(gl(20, 1), 3):gl(3, 20)
          Df Sum Sq Mean Sq F value   Pr(>F)   
gl(3, 20)  2  19820    9910  6.9592 0.002659 **
Residuals 38  54113    1424                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I have included the unaveraged data below. 

Thank you for any help you can provide, 

Steve


"tmp1" <-
structure(list(mapping = structure(as.integer(c(1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
5, 5, 5, 5, 5, 5, 5, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 
5, 5, 5, 5, 5, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
4, 5, 5, 5, 5, 5, 5, 5)), .Label = c("color", "shape", "letter", 
"compatible", "incompatible"), class = "factor"), finger =
structure(as.integer(c(1, 
2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 
1, 1, 2, 2, 2, 2, 3, 4, 4, 1, 2, 3, 3, 3, 3, 4, 1, 2, 2, 2, 2, 
2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 1, 2, 2, 3, 3, 4, 1, 2, 2, 3, 
3, 3, 4, 4, 1, 2, 3, 3, 3, 3, 4, 4, 4, 1, 2, 2, 2, 2, 3, 3, 4, 
4, 1, 1, 1, 2, 2, 3, 3, 3, 4, 1, 1, 1, 1, 1, 1, 2, 3, 3, 3, 4, 
4, 4, 4, 4, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 1, 1, 1, 1, 
1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 
4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 3, 4)), .Label = c("index", "middle", 
"ring", "little"), class = "factor"), Subject = structure(as.integer(c(1, 
1, 2, 4, 6, 7, 2, 7, 8, 1, 2, 10, 11, 9, 13, 15, 10, 11, 15, 
10, 12, 16, 21, 22, 21, 22, 23, 24, 23, 21, 23, 30, 27, 25, 29, 
30, 32, 31, 36, 33, 34, 35, 38, 40, 33, 36, 39, 40, 33, 35, 36, 
38, 39, 40, 4, 5, 8, 1, 6, 6, 12, 10, 12, 9, 12, 16, 11, 15, 
19, 19, 17, 20, 21, 24, 19, 22, 24, 31, 28, 30, 31, 32, 27, 28, 
26, 27, 33, 35, 38, 36, 37, 35, 37, 38, 37, 2, 3, 5, 6, 7, 8, 
3, 3, 4, 5, 3, 4, 5, 7, 8, 9, 13, 14, 15, 16, 11, 14, 16, 13, 
14, 9, 13, 14, 17, 18, 20, 23, 24, 17, 18, 20, 18, 19, 22, 17, 
18, 20, 25, 26, 27, 28, 29, 32, 25, 26, 29, 26, 31, 25, 28, 29, 
30, 32, 34, 37, 39, 40, 39, 34, 34)), .Label = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", 
"38", "39", "40"), class = c("ordered", "factor")), cluster =
structure(as.integer(c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), .Label = c("slowest", "moderate", 
"fastest"), class = c("ordered", "factor")), ti = c(63.1004276803134, 
-23.2992754096397, -135.957917918668, 42.7865713093507, 40.6495491827651, 
30.9835635847383, -140.456315927602, 16.1884759276633, -89.540435241939, 
31.7971254876269, -29.4232163522977, -25.1278025126572, -12.4566535167804, 
37.4289344617898, 52.5706302338991, -35.8413047438251, 15.2155259231272, 
2.79001951428484, -74.016437811255, 34.0472876514206, 8.23236516636759, 
41.752385308372, 6.86452959238867, -9.04240128098512, -35.5546654074403, 
-25.5058760190386, -31.6500756806436, -80.0839944596871, -178.282994307473, 
14.0652868642960, -54.1115243880536, -51.9682404685031, -54.9501041501482, 
-42.7431807244454, 16.6243910954292, -22.4431847176876, -14.7035707173501, 
-18.1056055182825, -49.7109985071331, -65.2909026882744, 119.501461368539, 
-27.0759830958289, 105.287851617557, -95.9423287716085, -86.0069936272954, 
-59.0894321514947, 20.8884421023127, -118.656165569357, -63.9198321967824, 
1.96187161119022, -153.768707101974, 49.3725736606709, 63.7546555849159, 
-12.7991590297592, 56.8196174117707, -9.31749776409478, -13.2032252050034, 
8.7394525690294, 26.8836983079709, 74.8622974499625, -4.27206914920544, 
33.6865400099209, -81.0876079273623, -9.57784097773667, 3.53941649430584, 
-43.1966556137174, 19.9505522181613, 31.1093048715892, -2.81846107124408, 
-16.4634071989511, -56.2626091719525, 5.8894597113893, -11.9460426501163, 
-50.5016604604713, 32.5846987625618, -6.58024973176658, 24.9634141295032, 
-29.7109602506806, -59.2110304133438, -33.073404186089, -71.242648484072, 
-59.7536863171843, -18.9735253330240, -56.0906981355611, -48.8598997546233, 
-23.1287373512131, -84.6533103813389, 21.5708032453866, 8.21368474082069, 
16.4365994699192, 71.586596869219, -45.6241843743441, 235.334517416515, 
101.332858707625, 140.604896021264, 12.1196300208124, 122.132015696109, 
37.0374068952659, 75.5952402766181, 94.4578687986073, 53.9870738026712, 
67.1856211307736, 77.1640848276398, 50.8498395189965, -78.4933065893263, 
95.1069600779013, 73.149329272702, -4.9339023803459, 113.586145614935, 
21.0193638623337, 22.883456523432, 25.435232226029, 24.3953597764337, 
18.6575664551592, -29.0022097713646, 10.4364866129519, 27.1594565069105, 
10.2399003236453, 14.7130660648562, 65.9229552720177, 72.592470053935, 
74.5394338773406, 67.9822705454202, 119.281862515375, 42.6436509977648, 
48.8182384714562, 16.9719600025627, 51.9587360263067, 47.7585090033625, 
4.61299932897614, -88.703759098076, 14.0860299350686, 13.7666737583254, 
-74.7564309039725, 110.686393495531, 47.5857765231565, 101.863081945050, 
-28.6193204179635, -30.6037063002317, -58.0976833896048, -30.7896122613757, 
-36.7538888196661, -43.1772014522395, -30.636780420773, -33.9709296084748, 
-22.5891682772131, -37.0311490456283, -45.2255618875948, -13.3899512263205, 
7.82870987793496, -47.7610311277551, -0.0190839046758119, -24.7251829424596,

62.6549969288884, -52.5006031302601, -41.0342563813402, -159.347540607221, 
0.608912472459365, -50.5971346100457, 66.386987819285)), .Names =
c("mapping", 
"finger", "Subject", "cluster", "ti"), row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", 
"15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", 
"26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", 
"37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", 
"48", "49", "50", "51", "52", "53", "54", "55", "56", "57", "58", 
"59", "60", "61", "62", "63", "64", "65", "66", "67", "68", "69", 
"70", "71", "72", "73", "74", "75", "76", "77", "78", "79", "80", 
"81", "82", "83", "84", "85", "86", "87", "88", "89", "90", "91", 
"92", "93", "94", "95", "96", "97", "98", "99", "100", "101", 
"102", "103", "104", "105", "106", "107", "108", "109", "110", 
"111", "112", "113", "114", "115", "116", "117", "118", "119", 
"120", "121", "122", "123", "124", "125", "126", "127", "128", 
"129", "130", "131", "132", "133", "134", "135", "136", "137", 
"138", "139", "140", "141", "142", "143", "144", "145", "146", 
"147", "148", "149", "150", "151", "152", "153", "154", "155", 
"156", "157", "158", "159", "160"), class = "data.frame")



More information about the R-help mailing list