[R] [OT] 1 vs 2-way anova technical question
Giovanni Azua
bravegag at gmail.com
Mon Nov 21 20:04:05 CET 2011
Hello Rob,
Thank you for your suggestions. I tried glm too without success. Anyhow I include all the information just in case someone with good knowledge can give me a hand with this. I take log of the response variable because:
- its values span across multiple orders of magnitudes
- the diagnostic plots e.g. QQ, residuals vs fitted etc do improve with that.
Below I include:
1) general summary of my data
2) 1-way anova and summary of the model
3) 4-way anova and summary of the model
Attached:
a) Overview of the data (where main interactions occur i.e. No_databases and No_middlewares)
b) diagnostic plots for 2) Here the Normality assumption of the residuals looks reasonable
c) diagnostic plots for 3) Here the Normality assumption of the residuals does not seem to hold so it invalidates the 4-way aov model?
I tried glm and it delivers similar results as 3)
My impression is that my system is heavily polluted with outliers one can see that from plot a) how much the mean and the median differ due to the outliers. That's just the way the system I implemented behaves. Btw the system is a multi-tiered architecture that I developed in Java from scratch that includes XA and different data access and partitioning patterns. I need to quantitatively analyze and draw conclusion from this system. Most of my class mates just make it real simple: make 2^k experiments take one grand mean out of each experiment and do the ANOVA on those means i.e. 1-repetition, compute the fraction of variation and that's it. I am trying to model it more deeply by checking model assumptions, etc.
Many thanks in advance,
Best regards,
Giovanni
> str(throughput)
'data.frame': 479 obs. of 9 variables:
$ Time : num 7 8 9 10 11 12 13 14 15 16 ...
$ Throughput : int 155 155 154 157 155 214 4631 2118 136 132 ...
$ Workload : chr "All" "All" "All" "All" ...
$ No_databases : Factor w/ 2 levels "1","4": 1 1 1 1 1 1 1 1 1 1 ...
$ Partitioning : Factor w/ 2 levels "sharding","replication": 1 1 1 1 1 1 1 1 1 1 ...
$ No_middlewares : Factor w/ 3 levels "1","2","4": 1 1 1 1 1 1 1 1 1 1 ...
$ Queue_size : Factor w/ 2 levels "40","100": 1 1 1 1 1 1 1 1 1 1 ...
$ No_clients : Factor w/ 1 level "64": 1 1 1 1 1 1 1 1 1 1 ...
$ Experimental_error: Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
> summary(throughput)
Time Throughput Workload No_databases Partitioning No_middlewares
Min. : 7.00 Min. : 35.0 Length:479 1:239 sharding :240 1:160
1st Qu.:11.50 1st Qu.: 50.5 Class :character 4:240 replication:239 2:159
Median :16.00 Median : 744.0 Mode :character 4:160
Mean :16.48 Mean : 830.3
3rd Qu.:21.00 3rd Qu.:1205.5
Max. :26.00 Max. :4631.0
Queue_size No_clients Experimental_error
40 :240 64:479 1:479
100:239
## #######################################################
##
## ANOVA "one-way" interaction
##
## #######################################################
> throughput.aov <- aov(log(Throughput)~No_databases+Partitioning+No_middlewares+Queue_size,data=throughput)
> throughput.aov
Call:
aov(formula = log(Throughput) ~ No_databases + Partitioning +
No_middlewares + Queue_size, data = throughput)
Terms:
No_databases Partitioning No_middlewares Queue_size Residuals
Sum of Squares 521.5264 5.6971 50.5814 0.4628 476.6826
Deg. of Freedom 1 1 2 1 473
Residual standard error: 1.003885
Estimated effects may be unbalanced
> summary(throughput.aov)
Df Sum Sq Mean Sq F value Pr(>F)
No_databases 1 521.53 521.53 517.4974 < 2.2e-16 ***
Partitioning 1 5.70 5.70 5.6530 0.01782 *
No_middlewares 2 50.58 25.29 25.0953 4.381e-11 ***
Queue_size 1 0.46 0.46 0.4592 0.49833
Residuals 473 476.68 1.01
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
## #######################################################
##
## ANOVA 4-way interaction
##
## #######################################################
> throughput.aov <- aov(log(Throughput)~No_databases*Partitioning*No_middlewares*Queue_size,data=throughput)
> throughput.aov
Call:
aov(formula = log(Throughput) ~ No_databases * Partitioning *
No_middlewares * Queue_size, data = throughput)
Terms:
No_databases Partitioning No_middlewares Queue_size No_databases:Partitioning
Sum of Squares 521.5264 5.6971 50.5814 0.4628 96.9198
Deg. of Freedom 1 1 2 1 1
No_databases:No_middlewares Partitioning:No_middlewares No_databases:Queue_size
Sum of Squares 110.4102 8.4819 0.0916
Deg. of Freedom 2 2 1
Partitioning:Queue_size No_middlewares:Queue_size
Sum of Squares 0.0015 0.2254
Deg. of Freedom 1 2
No_databases:Partitioning:No_middlewares No_databases:Partitioning:Queue_size
Sum of Squares 23.6400 0.0512
Deg. of Freedom 2 1
No_databases:No_middlewares:Queue_size Partitioning:No_middlewares:Queue_size
Sum of Squares 0.1247 0.1511
Deg. of Freedom 2 2
No_databases:Partitioning:No_middlewares:Queue_size Residuals
Sum of Squares 0.7391 235.8461
Deg. of Freedom 2 455
Residual standard error: 0.7199605
Estimated effects may be unbalanced
> summary(throughput.aov)
Df Sum Sq Mean Sq F value Pr(>F)
No_databases 1 521.53 521.53 1006.1413 < 2.2e-16 ***
Partitioning 1 5.70 5.70 10.9909 0.0009888 ***
No_middlewares 2 50.58 25.29 48.7914 < 2.2e-16 ***
Queue_size 1 0.46 0.46 0.8928 0.3452201
No_databases:Partitioning 1 96.92 96.92 186.9800 < 2.2e-16 ***
No_databases:No_middlewares 2 110.41 55.21 106.5030 < 2.2e-16 ***
Partitioning:No_middlewares 2 8.48 4.24 8.1818 0.0003229 ***
No_databases:Queue_size 1 0.09 0.09 0.1766 0.6744713
Partitioning:Queue_size 1 0.00 0.00 0.0028 0.9576692
No_middlewares:Queue_size 2 0.23 0.11 0.2174 0.8046764
No_databases:Partitioning:No_middlewares 2 23.64 11.82 22.8034 3.648e-10 ***
No_databases:Partitioning:Queue_size 1 0.05 0.05 0.0988 0.7534090
No_databases:No_middlewares:Queue_size 2 0.12 0.06 0.1203 0.8866605
Partitioning:No_middlewares:Queue_size 2 0.15 0.08 0.1457 0.8644517
No_databases:Partitioning:No_middlewares:Queue_size 2 0.74 0.37 0.7129 0.4907654
Residuals 455 235.85 0.52
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Thanks in advance,
Best regards,
Giovanni
More information about the R-help
mailing list