[R] comparing strength of association instead of strength of evidence?

Sat Jun 25 01:21:21 CEST 2005

On 6/24/05, Kjetil Brinchmann Halvorsen <kjetil at acelerate.com> wrote:
> Weiwei Shi wrote:
> 
> >Hi,
> >I asked this question before, which was hidden in a bunch of
> >questions. I repharse it here and hope I can get some help this time:
> >
> >I have 2 contingency tables which have the same group variable Y. I
> >want to compare the strength of association between X1/Y and X2/Y. I
> >am not sure if comparing p-values IS the way  even though the
> >probability of seeing such "weird" observation under H0 defines
> >p-value and it might relate to the strength of association somehow.
> >But I read the following statement from Alan Agresti's "An
> >Introduction to Categorical Data Analysis" :
> >"Chi-squared tests simply indicate the degree of EVIDENCE for an
> >association....It is sensible to decompose chi-squared into
> >components, study residuals, and estimate parameters such as odds
> >ratios that describe the STRENGTH OF ASSOCIATION".
> >
> >
> >
> Here are some things you can do:
> 
>  > tab1<-array(c(11266, 125, 2151526, 31734), dim=c(2,2))
> 
>  > tab2<-array(c(43571, 52, 2119221, 31807), dim=c(2,2))
>  > library(epitools) # on CRAN
>  > ?odds.ratio
> Help for 'odds.ratio' is shown in the browser
>  > library(help=epitools) # on CRAN
>  > tab1
>      [,1]    [,2]
> [1,] 11266 2151526
> [2,]   125   31734
>  > odds.ratio(11266, 125, 2151526, 31734)
> Error in fisher.test(tab) : FEXACT error 40.
> Out of workspace.                 # so this are evidently for tables
> with smaller counts
>  > library(vcd) # on CRAN
> 
>  > ?oddsratio
> Help for 'oddsratio' is shown in the browser
>  > oddsratio( tab1)  # really is logodds ratio
> [1] 0.2807548
>  > plot(oddsratio( tab1) )
>  > library(help=vcd) # on CRAN  Read this for many nice functions.
>  > fourfoldplot(tab1)
>  > mosaicplot(tab1)     # not really usefull for this table
> 
> Also has a look at function Crosstable in package gmodels.
> 
> To decompose the chisqure you can program yourselves:
> 
> decomp.chi <- function(tab) {
>        rows <-  rowSums(tab)
>        cols <-   colSums(tab)
>        N <-   sum(rows)
>         E <- rows %o% cols / N
>         contrib <- (tab-E)^2/E
>         contrib }
> 
> 
>  > decomp.chi(tab1)
>          [,1]         [,2]
> [1,] 0.1451026 0.0007570624
> [2,] 9.8504915 0.0513942218
>  >
> 
> So you can easily see what cell contributes most to the overall chisquared.
> 
> Kjetil
> 
> 
> 
> 
> 
> >Can I do this "decomposition" in R for the following example including
> >2 contingency tables?
> >
> >
> >
> >>tab1<-array(c(11266, 125, 2151526, 31734), dim=c(2,2))
> >>tab1
> >>
> >>
> >      [,1]    [,2]
> >[1,] 11266 2151526
> >[2,]   125   31734
> >
> >
> >
> >>tab2<-array(c(43571, 52, 2119221, 31807), dim=c(2,2))
> >>tab2
> >>
> >>
> >      [,1]    [,2]
> >[1,] 43571 2119221
> >[2,]    52   31807
> >

Here are a few more ways of doing this using chisq.test,
glm and assocplot:

> ## chisq.test ###

> tab1.chisq <- chisq.test(tab1)

> # decomposition of chisq
> resid(tab1.chisq)^2
          [,1]         [,2]
[1,] 0.1451026 0.0007570624
[2,] 9.8504915 0.0513942218

> # same
> with(tab1.chisq, (observed - expected)^2/expected) 
          [,1]         [,2]
[1,] 0.1451026 0.0007570624
[2,] 9.8504915 0.0513942218

> # Pearson residuals
> resid(tab1.chisq) 
           [,1]        [,2]
[1,]  0.3809234 -0.02751477
[2,] -3.1385493  0.22670294

> # same
> with(tab1.chisq, (observed - expected)/sqrt(expected)) 
           [,1]        [,2]
[1,]  0.3809234 -0.02751477
[2,] -3.1385493  0.22670294

> ### glm ###
> # Pearson residuals via glm

> tab1.df <- data.frame(count = c(tab1), A = gl(2,2), B = gl(2,1,4))
> tab1.glm <- glm(count ~ ., tab1.df, family = poisson())
> resid(tab1.glm, type = "pearson")
          1           2           3           4 
 0.38092339 -3.13854927 -0.02751477  0.22670294 
> plot(tab1.glm)

> ### assocplot ###
> # displaying Pearson residuals via an assocplot
> assocplot(t(tab1))