[R] Discrete Multivariate Analysis (log-linear model)

Jorge Magalhães jmagalhaes at oninetspeed.pt
Wed Apr 16 16:04:24 CEST 2003

I'm reading a old statistics book "MARVIN J. Karson (1982), Multivariate 
Statistical Methods, The IOWA State University Press, Iowa". In the chapter 
XI i can find some information about discrete multivariate analysis. This 
chapter is restricted to an introduction to log-linear models for analysis of 
multidimensional contingency tables. For example, in the log-linear model for 
the 3-way table e can test several Hypothesis:

Note: MLE, maximum likelihood estimator
Mijk: counts in the cell i, j, k

Hypothesis #		Hypothesis		df				MLE of Mijk
1				(ABC)			(a-1)(b-1)(c-1)		----
2				(AB)(ABC)			(a-1)(b-1)c			mi+k m+jk/m++k
3				(AC)(ABC)			(a-1)(c-1)b			mij+m+jk/m+j+
8				(AB)(AC)(BC)(ABC)	abc-a-b-c+2		mi++m+j+m++k/n²

for better understanding it, MARVIN gives an example:

>From MARVIN (p.269):

"Table X presents data from a national sample of n = 10524 respondents 
classified according to income A; mobility, B; and educational C. Each 
variable was categorized in two classes, with income classified as low 
<$12,500 and high otherwise, mobility classified as mobile if the respondent 
has made one or more moves over the last five years and nonmobile otherwise, 
and education classified as high scholl graduate or under versus some college 
or above. 
								Table X

			        	Mobile (j=1)				Nonmobile(j=2)
			High School		College		High School 		College
				k = 1			k=2				k=1			k=2
Low income (i=1)	1137			1091				2160			886
High income (i=2)	547			1415				1363			1925

M(111) = 1137
M(121) = 2160
and so on....

The satured model is:

Lijk = ln Mijk
Lijk =  u + u(A)(i) + u(B)(j) + u(AB)(ij) + u(AC)(ik) + u(BC)(jk) +u(ABC)(ijk)

For example, the independence hypothesis 8, states that income, mobility, and 
education are independent variables when grouped according to the given cross 
classification of low or high, mobile or nonmobile, and high or college.
.....". end of citation

My main questios is: how i can perform similar analysis in R environment. I 
want to test the hypothesis 8 and the all others. For each hypothesis, i want 
calculate de X^2 and G^2 and select the best model for fit the data 
moderately well.

Note: G^2 = 2 SUM (mijk ln(mijk/Mijk))
Note: X^2 = SUM((mijk-E(Mijk))^2/E(Mijk))

Thanks very much, in advance.

Jorge Magalhães

More information about the R-help mailing list