[R] Chi-Square test and survey results

Wed Oct 12 15:24:45 CEST 2011

George,

Perhaps the site of the RISQ project (Representativity indicators for  
Survey Quality) might be of use: http://www.risq-project.eu/ . They  
also provide R-code to calculate their indicators.

HTH,
Jan

Quoting gheine at mathnmaps.com:

> An organization has asked me to comment on the validity of their
> recent all-employee survey.  Survey responses, by geographic region, compared
> with the total number of employees in each region, were as follows:
>
>> ByRegion
>           All.Employees Survey.Respondents
> Region_1            735                142
> Region_2            500                 83
> Region_3            897                 78
> Region_4            717                133
> Region_5            167                 48
> Region_6            309                  0
> Region_7            806                125
> Region_8            627                122
> Region_9            858                177
> Region_10           851                160
> Region_11           336                 52
> Region_12          1823                312
> Region_13            80                  9
> Region_14           774                121
> Region_15           561                 24
> Region_16           834                134
>
> How well does the survey represent the employee population?
> Chi-square test says, not very well:
>
>> chisq.test(ByRegion)
>
>         Pearson's Chi-squared test
>
> data:  ByRegion
> X-squared = 163.6869, df = 15, p-value < 2.2e-16
>
> By striking three under-represented regions (3,6, and 15), we get
> a more reasonable, although still not convincing, result:
>
>> chisq.test(ByRegion[setdiff(1:16,c(3,6,15)),])
>
>         Pearson's Chi-squared test
>
> data:  ByRegion[setdiff(1:16, c(3, 6, 15)), ]
> X-squared = 22.5643, df = 12, p-value = 0.03166
>
> This poses several questions:
>
> 1)  Looking at a side-by-side barchart (proportion of responses vs.
> proportion of employees, per region), the pattern of survey responses
> appears, visually, to match fairly well the pattern of employees.  Is
> this a case where we trust the numbers and not the picture?
>
> 2) Part of the problem, ironically, is that there were too many responses
> to the survey.  If we had only one-tenth the responses, but in the same
> proportions by region, the chi-square statistic would look much better,
> (though with a warning about possible inaccuracy):
>
> data:  data.frame(ByRegion$All.Employees, 0.1 *   
> (ByRegion$Survey.Respondents))
> X-squared = 17.5912, df = 15, p-value = 0.2848
>
> Is there a way of reconciling a large response rate with an unrepresentative
> response profile?  Or is the bad news that the survey will give very precise
> results about a very ill-specified sub-population?
>
> (Of course, I would put in softer terms, like "you need to assess the degree
> of homogeneity across different regions" .)
>
> 3) Is Chi-squared really the right measure of how representative is the
> survey?
>
> <<<<<<< >>>>>>>>>
>
> Thanks for any help you can give - hope these questions make sense -
>
> George H.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.