[R] comparing strength of association instead of strength of evidence?
Gabor Grothendieck
ggrothendieck at gmail.com
Sat Jun 25 01:21:21 CEST 2005
On 6/24/05, Kjetil Brinchmann Halvorsen <kjetil at acelerate.com> wrote:
> Weiwei Shi wrote:
>
> >Hi,
> >I asked this question before, which was hidden in a bunch of
> >questions. I repharse it here and hope I can get some help this time:
> >
> >I have 2 contingency tables which have the same group variable Y. I
> >want to compare the strength of association between X1/Y and X2/Y. I
> >am not sure if comparing p-values IS the way even though the
> >probability of seeing such "weird" observation under H0 defines
> >p-value and it might relate to the strength of association somehow.
> >But I read the following statement from Alan Agresti's "An
> >Introduction to Categorical Data Analysis" :
> >"Chi-squared tests simply indicate the degree of EVIDENCE for an
> >association....It is sensible to decompose chi-squared into
> >components, study residuals, and estimate parameters such as odds
> >ratios that describe the STRENGTH OF ASSOCIATION".
> >
> >
> >
> Here are some things you can do:
>
> > tab1<-array(c(11266, 125, 2151526, 31734), dim=c(2,2))
>
> > tab2<-array(c(43571, 52, 2119221, 31807), dim=c(2,2))
> > library(epitools) # on CRAN
> > ?odds.ratio
> Help for 'odds.ratio' is shown in the browser
> > library(help=epitools) # on CRAN
> > tab1
> [,1] [,2]
> [1,] 11266 2151526
> [2,] 125 31734
> > odds.ratio(11266, 125, 2151526, 31734)
> Error in fisher.test(tab) : FEXACT error 40.
> Out of workspace. # so this are evidently for tables
> with smaller counts
> > library(vcd) # on CRAN
>
> > ?oddsratio
> Help for 'oddsratio' is shown in the browser
> > oddsratio( tab1) # really is logodds ratio
> [1] 0.2807548
> > plot(oddsratio( tab1) )
> > library(help=vcd) # on CRAN Read this for many nice functions.
> > fourfoldplot(tab1)
> > mosaicplot(tab1) # not really usefull for this table
>
> Also has a look at function Crosstable in package gmodels.
>
> To decompose the chisqure you can program yourselves:
>
> decomp.chi <- function(tab) {
> rows <- rowSums(tab)
> cols <- colSums(tab)
> N <- sum(rows)
> E <- rows %o% cols / N
> contrib <- (tab-E)^2/E
> contrib }
>
>
> > decomp.chi(tab1)
> [,1] [,2]
> [1,] 0.1451026 0.0007570624
> [2,] 9.8504915 0.0513942218
> >
>
> So you can easily see what cell contributes most to the overall chisquared.
>
> Kjetil
>
>
>
>
>
> >Can I do this "decomposition" in R for the following example including
> >2 contingency tables?
> >
> >
> >
> >>tab1<-array(c(11266, 125, 2151526, 31734), dim=c(2,2))
> >>tab1
> >>
> >>
> > [,1] [,2]
> >[1,] 11266 2151526
> >[2,] 125 31734
> >
> >
> >
> >>tab2<-array(c(43571, 52, 2119221, 31807), dim=c(2,2))
> >>tab2
> >>
> >>
> > [,1] [,2]
> >[1,] 43571 2119221
> >[2,] 52 31807
> >
Here are a few more ways of doing this using chisq.test,
glm and assocplot:
> ## chisq.test ###
> tab1.chisq <- chisq.test(tab1)
> # decomposition of chisq
> resid(tab1.chisq)^2
[,1] [,2]
[1,] 0.1451026 0.0007570624
[2,] 9.8504915 0.0513942218
> # same
> with(tab1.chisq, (observed - expected)^2/expected)
[,1] [,2]
[1,] 0.1451026 0.0007570624
[2,] 9.8504915 0.0513942218
> # Pearson residuals
> resid(tab1.chisq)
[,1] [,2]
[1,] 0.3809234 -0.02751477
[2,] -3.1385493 0.22670294
> # same
> with(tab1.chisq, (observed - expected)/sqrt(expected))
[,1] [,2]
[1,] 0.3809234 -0.02751477
[2,] -3.1385493 0.22670294
> ### glm ###
> # Pearson residuals via glm
> tab1.df <- data.frame(count = c(tab1), A = gl(2,2), B = gl(2,1,4))
> tab1.glm <- glm(count ~ ., tab1.df, family = poisson())
> resid(tab1.glm, type = "pearson")
1 2 3 4
0.38092339 -3.13854927 -0.02751477 0.22670294
> plot(tab1.glm)
> ### assocplot ###
> # displaying Pearson residuals via an assocplot
> assocplot(t(tab1))
More information about the R-help
mailing list