[R] Odd results with Chi-square test. (Not an R problem, but general statistics, I think.)
mik07
someone29_7 at yahoo.de
Tue Aug 18 16:29:03 CEST 2009
Hi,
I am working on a system which automatically answers user questions (such
systems are commonly called "Question Answering systems"). I evaluated
different versions of the same system on a publicly available test sets.
Naturally, there is a fixed number of questions in the test set, and the
system answers some right and some wrong.
I want to compare each version of the system against a baseline and see
whether the increase is statistically significant. I used one-tailed chi
square tests for this.
Here's the data I got:
Test set 1:
total incorrect correct p
baseline 1908 1718 190
version_1 1908 1698 210 0,145
version_2 1908 1690 218 0,071
version_3 1908 1677 231 0,017
I compared every version with the baseline, so that I get something like a
2x2 contingency table, as here:
incorrect correct
baseline 1718 190
version_1 1698 210
p: 0,145
This works fine, the results seem to make sense intuitively.
First question:
Do you think this is a legitimate way to compute significance?
But then I also have figures on *partial* test sets, because there are some
questions for which we just cannot expect the system to return correct
answers. (The reason for this is beyond the scope of this post.) So
different versions of the system work on test sets of different sizes. Then
we get:
Test set 2:
total incorrect correct p
baseline 898 708 190
version_1 898 688 210 0,128
version_2 898 680 218 0,057
version_3 1021 790 231 0,219
Here, the p value for version_3 (when compared with the baseline) seems to
make no sense whatsoever. It shouldn't be larger that the other two p
values, the increase in correct answers (that is what counts!) is bigger
after all.
Any idea what's going on here? I thought the sample size should have no
impact on the results?
Thanks a lot,
Mika
--
View this message in context: http://www.nabble.com/Odd-results-with-Chi-square-test.-%28Not-an-R-problem%2C-but-general-statistics%2C-I-think.%29-tp25026167p25026167.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list