[R] Help on choosing the appropriate analysis method
Juhász Péter
peter.juhasz83 at gmail.com
Sun Oct 17 10:26:46 CEST 2010
Dear R-help,
I'd like ask for your opinion on choosing the "right" strategy for a
particular dataset.
We conducted 24-hour electric field measurements on 90 subjects. They
are grouped by job (2 categories) and location (3 categories). There are
four exposure metrics assigned to each subject.
An excerpt from the data:
n job location M OA UE all
0 job1 dist_200 0.297 0.072 0.171 0.297
1 job1 dist_200 0.083 0.529 0.066 0.529
2 job1 dist_200 0.105 0.145 1.072 1.072
3 job1 dist_200 0.096 0.431 0.099 0.431
4 job1 dist_200 0.137 0.077 0.092 0.137
5 job1 dist_20 NA 0.296 0.107 0.296
6 job1 dist_200 NA 1.595 0.293 1.595
7 job1 dist_20 NA 0.085 0.076 0.085
8 job1 dist_20 NA 2.120 0.319 2.120
9 job1 dist_20 NA 0.881 NA 0.881
10 job1 dist_0 NA 0.221 NA 0.221
80 job2 dist_20 0.800 0.342 1.482 1.482
81 job2 dist_20 NA 0.521 0.050 0.521
82 job2 dist_200 NA 0.497 0.502 0.502
83 job2 dist_200 NA 2.777 NA 2.777
84 job2 dist_20 NA 0.127 0.050 0.127
85 job2 dist_200 NA 2.508 0.423 2.508
86 job2 dist_200 0.216 0.350 2.782 2.782
87 job2 dist_200 NA 2.777 1.996 2.777
88 job2 dist_200 2.348 0.890 2.777 2.777
89 job2 dist_200 NA 0.488 NA 0.488
I'd like to know whether the differences between the group means are
significant. Is a pairwise t-test (for location, and a simple t-test for
job) appropriate in this case?
data = read.table("data.txt", header=T, nrows=90)
attach(data)
res1 = pairwise.t.test(all, location, p.adj="bonf")
print(res1)
res2 = pairwise.t.test(M, location, p.adj="bonf")
print(res2)
res3 = pairwise.t.test(OA, location, p.adj="bonf")
print(res3)
res4 = pairwise.t.test(UE, location, p.adj="bonf")
print(res4)
res1 = t.test(all~job)
print(res1)
res2 = t.test(M~job)
print(res2)
res3 = t.test(OA~job)
print(res3)
res4 = t.test(UE~job)
print(res4)
I'd also like to compare the four exposure metrics - how to do that?
One potential problem is that the distribution is not normal for any of
the exposure metrics: it's close to lognormal. (In fact, it's even worse
than that: the measuring instrument has a relatively high lower
detection limit, and all off-scale low points are marked as the det.
limit. In other words, non-detects are censored.)
Doesn't this make t-tests useless?
Thank you in advance:
Péter Juhász
More information about the R-help
mailing list