[R] Chi-Square test and survey results
Jan van der Laan
rhelp at eoos.dds.nl
Wed Oct 12 15:24:45 CEST 2011
George,
Perhaps the site of the RISQ project (Representativity indicators for
Survey Quality) might be of use: http://www.risq-project.eu/ . They
also provide R-code to calculate their indicators.
HTH,
Jan
Quoting gheine at mathnmaps.com:
> An organization has asked me to comment on the validity of their
> recent all-employee survey. Survey responses, by geographic region, compared
> with the total number of employees in each region, were as follows:
>
>> ByRegion
> All.Employees Survey.Respondents
> Region_1 735 142
> Region_2 500 83
> Region_3 897 78
> Region_4 717 133
> Region_5 167 48
> Region_6 309 0
> Region_7 806 125
> Region_8 627 122
> Region_9 858 177
> Region_10 851 160
> Region_11 336 52
> Region_12 1823 312
> Region_13 80 9
> Region_14 774 121
> Region_15 561 24
> Region_16 834 134
>
> How well does the survey represent the employee population?
> Chi-square test says, not very well:
>
>> chisq.test(ByRegion)
>
> Pearson's Chi-squared test
>
> data: ByRegion
> X-squared = 163.6869, df = 15, p-value < 2.2e-16
>
> By striking three under-represented regions (3,6, and 15), we get
> a more reasonable, although still not convincing, result:
>
>> chisq.test(ByRegion[setdiff(1:16,c(3,6,15)),])
>
> Pearson's Chi-squared test
>
> data: ByRegion[setdiff(1:16, c(3, 6, 15)), ]
> X-squared = 22.5643, df = 12, p-value = 0.03166
>
> This poses several questions:
>
> 1) Looking at a side-by-side barchart (proportion of responses vs.
> proportion of employees, per region), the pattern of survey responses
> appears, visually, to match fairly well the pattern of employees. Is
> this a case where we trust the numbers and not the picture?
>
> 2) Part of the problem, ironically, is that there were too many responses
> to the survey. If we had only one-tenth the responses, but in the same
> proportions by region, the chi-square statistic would look much better,
> (though with a warning about possible inaccuracy):
>
> data: data.frame(ByRegion$All.Employees, 0.1 *
> (ByRegion$Survey.Respondents))
> X-squared = 17.5912, df = 15, p-value = 0.2848
>
> Is there a way of reconciling a large response rate with an unrepresentative
> response profile? Or is the bad news that the survey will give very precise
> results about a very ill-specified sub-population?
>
> (Of course, I would put in softer terms, like "you need to assess the degree
> of homogeneity across different regions" .)
>
> 3) Is Chi-squared really the right measure of how representative is the
> survey?
>
> <<<<<<< >>>>>>>>>
>
> Thanks for any help you can give - hope these questions make sense -
>
> George H.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list