[R] 2x2 Contingency table with much sampling zeroes
Charles C. Berry
cberry at tajo.ucsd.edu
Tue Oct 20 18:09:25 CEST 2009
On Tue, 20 Oct 2009, Etienne Toffin wrote:
> Hi,
>
> I'm analyzing experimental results where two different events ("T1" and "T2")
> can occur or not during an experiment. I made my experiments with one factor
> ("Substrate") with two levels ("Sand" and "Clay").
> I would like to know wether or not "Substrate" affects the occurrence
> probability of the two events.
It is not clear to me what you mean by 'affects the occurence ...'.
This sounds like 'Independence of Substrate from the two other variables',
which is a 3 degree of freedom hypothesis (at least in the example you
give).
Is that what you are after or are only some of those contrasts
interesting?
Moreover, for each condition I would like to
> test the heterogeneity of my experimental contingency table with a
> theoretical one (from simulations).
>
Do you mean you have some prior values for the counts or proportions? If
so a standard goodness of fit test should do. If not, you need to describe
the problem in more detail.
> However, my problem is that several cells have sampling zeroes. My
> experiments can't be done again to fill these cells. Thus Chi-square
> requirements are not fulfilled and I have to find another statistical method.
>
Sampling zeroes in the cells are not a problem as long as the marginal
tables do not have such zeroes. Depending on the hypotheses you want to
test, the marginal tables may be OK. 'Substrate' is OK and so is 'T1 by
T2', so you can do the 3 degree of freedom test implied by those margins.
> After spending hours searching for a solution, I thought I could use
> loglinear model to answer my questions, but :
> - I'm not sure I can use loglinear model = do I fulfill the required
> conditions ?
Have you studied the Agresti reference listed in the help page?? I'll bet
it addresses 'the required conditions' - which go to the sampling
distribution of the counts.
> - would this method answer to my hypothesis ?
> - I not sure to really understand how I have to use loglin()…
>
run
example(loglin)
and reread
?loglin
The example is the same setup as you have here (albeit with more degrees
of freedom), so you might emulate it.
> Here is the data frame of my results.
>
> DF<-data.frame(Subs=c(rep("Sand",4),rep("Clay",4)),T1=rep(c("YES","YES","NO","NO"),2),T2=rep(c("YES","NO","YES","NO"),2),Freq=c(12,5,0,7,24,1,0,0))
>
> What do you think of such datas ? Can I use any statistical method to test my
> hypothesis ? Any advice ?
Recruit a statistician to your committee. Questions like these are better
hashed out in front of a blackboard than over the internet.
HTH,
Chuck
>
> Thanks,
>
> Etienne Toffin
>
>
> -------------------------------------------------------------------
> Etienne Toffin, PhD Student
> Unit of Social Ecology
> Université Libre de Bruxelles, CP 231
> Boulevard du Triomphe
> B-1050 Brussels
> Belgium
>
> Tel: +32(0)2/650.55.30
> Fax: +32(0)2/650.57.67
> http://www.ulb.ac.be/sciences/use/toffin.html
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help
mailing list