[R] Tabulating Sparse Contingency Table

Charles C. Berry cberry at tajo.ucsd.edu
Sat Mar 29 03:17:39 CET 2008



Dear 'Born',

There was  thread on this recently, but I cannot seem to find it.

The best suggestion (IMHO) was along these lines:

aggregate( rep(1,40), as.data.frame(diag(4)[sample(1:4,40,repl=T),]), sum )

See also

   http://thread.gmane.org/gmane.comp.lang.r.general/104798/focus=104841

and if you have a really big problem and access to unix utilities you 
might  consider something like this:

dat <- read.table( pipe('sort file.dat | uniq -c' ) )


HTH,

Chuck

p.s. the 'netiquette' of this list is to identify yourself with an 
appropriate email handle or signature block.

On Fri, 28 Mar 2008, born.to.b.wyld at gmail.com wrote:

> I have a sparse contingency table (most cells are 0):
>
>> xtabs(~.,data[,idx:(idx+4)])
> , , x3 = 1, x4 = 1, x5 = 1
>
>   x2
> x1    1   2   3
>  1   0   0  31
>  2   0   0 112
>  3   0   0  94
>
> , , x3 = 2, x4 = 1, x5 = 1
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 3, x4 = 1, x5 = 1
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 1, x4 = 2, x5 = 1
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 2, x4 = 2, x5 = 1
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0  18   0
>  3   0  27   0
>
> , , x3 = 3, x4 = 2, x5 = 1
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 1, x4 = 3, x5 = 1
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 2, x4 = 3, x5 = 1
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 3, x4 = 3, x5 = 1
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   1   0   0
>  3   2   0   0
>
> , , x3 = 1, x4 = 1, x5 = 2
>
>   x2
> x1    1   2   3
>  1   0   0 142
>  2   0   0 340
>  3   0   0   1
>
> , , x3 = 2, x4 = 1, x5 = 2
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 3, x4 = 1, x5 = 2
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 1, x4 = 2, x5 = 2
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 2, x4 = 2, x5 = 2
>
>   x2
> x1    1   2   3
>  1   0   4   0
>  2   0  41   0
>  3   0   0   0
>
> , , x3 = 3, x4 = 2, x5 = 2
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 1, x4 = 3, x5 = 2
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 2, x4 = 3, x5 = 2
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 3, x4 = 3, x5 = 2
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 1, x4 = 1, x5 = 3
>
>   x2
> x1    1   2   3
>  1   0   0 173
>  2   0   0   4
>  3   0   0   0
>
> , , x3 = 2, x4 = 1, x5 = 3
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 3, x4 = 1, x5 = 3
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 1, x4 = 2, x5 = 3
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 2, x4 = 2, x5 = 3
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 3, x4 = 2, x5 = 3
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 1, x4 = 3, x5 = 3
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 2, x4 = 3, x5 = 3
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
> , , x3 = 3, x4 = 3, x5 = 3
>
>   x2
> x1    1   2   3
>  1   0   0   0
>  2   0   0   0
>  3   0   0   0
>
>
>
>
>
>
>
> Now, I do can do the following to get the sparse representation 'y' for the
> table above:
>
>> idx<-2
>> y<-as.data.frame.table(xtabs(~.,data[,idx:(idx+4)]))
>> y<-y[y$Freq>0,]
>> z<-sort(y$Freq,decreasing=T,index.return=T)
>> y<-y[z$ix,]
>> y
>    x1 x2 x3 x4 x5 Freq
> 89   2  3  1  1  2  340
> 169  1  3  1  1  3  173
> 88   1  3  1  1  2  142
> 8    2  3  1  1  1  112
> 9    3  3  1  1  1   94
> 122  2  2  2  2  2   41
> 7    1  3  1  1  1   31
> 42   3  2  2  2  1   27
> 41   2  2  2  2  1   18
> 121  1  2  2  2  2    4
> 170  2  3  1  1  3    4
> 75   3  1  3  3  1    2
> 74   2  1  3  3  1    1
> 90   3  3  1  1  2    1
>
>
>
>
> I am wondering if there is an R function, or a simple R routine which would
> help me make the data frame 'y' without using 'xtabs'. I need to study
> contingency tables of 20 (or even more) dimensions. R is unable to store a
> full 3^20 contingency table. But since the tables of interest are highly
> sparse, I figure the problem at hand could be highly simplified if I have
> something that would create a sparse representation.
>
> Any help or suggestions would be greatly appreciated.
>
> Thanks,
> A
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list