[R] Creating contingency table from mixed data

Sun May 6 10:48:14 CEST 2007

On 05-May-07 23:14:38, spime wrote:
> 
> Hi,
> 
> I am new in R. Please help me in the following case.
> 
> I have data in hand:
> http://www.nabble.com/file/8225/Data.txt Data.txt 
> 
> There are some categorical (binary and nominal) and continuous
> variables.
> 
> How can i get a generic RXC contingency table from this table? My main
> objective is to fine count in each cell and mean of continuous
> variables in
> each cell.
> 
> Please reply.
> 
> Thanks in advance

If what is in that file is all your data, then it is easily and
quite wuickly (10 minutes) done by hand, facilitated by first
re-ordering your data as:

Var1    Var2      Var3     Var4       Var5
0        11         1        0         144
0        17         1        1         123
0        15         1        1         117
0        18         2        0          99
0        22         2        1         142

1        17         1        0         136
1        10         1        1         109
1         8         2        1         133
1        17         2        1         108
1        11         3        0         112
1        16         3        0         121
1        12         3        1         152

>From which, the following is easy to obtain:

                           Var3:
             ---------------------------------
Var1:0 |      1       |      2       |      3       |
=====================================================
Var4:0 |   (11,144)   |   (18, 99)   |              |
       |              |              |              |
-----------------------------------------------------
Count: |      1       |      1       |      0       |
 Mean: |   (11,144)   |   (18. 99)   |              |
=====================================================
Var4:1 |   (17,123)   |   (22,142)   |              |
       |   (15,117)   |              |              |
-----------------------------------------------------
Count: |      2       |      1       |      0       |
 Mean: |   (16,120)   |   (22,142)   |              |
=====================================================

                           Var3:
             ---------------------------------
Var1:1 |      1       |      2       |      3       |
=====================================================
Var4:0 |   (17,136)   |              |   (11,112)   |
       |              |              |   (16,121)   |
-----------------------------------------------------
Count: |      1       |      0       |      2       |
 Mean: |   (17,136)   |              | (13.5,116.5) |
=====================================================
Var4:1 |   (10,109)   |   ( 8,133)   |   (12,152)   |
       |              |   (17,108)   |              |
-----------------------------------------------------
Count: |      1       |      2       |      1       |
 Mean: |   (10,109)   | (12.5,120.5) |   (12,152)   |
=====================================================

To do it automatically, you could get the counts alone by
applying table() to the "factor" columns (vars 1, 2, 4, taken
all together). Thus (where "Dat" is a dataframe with columns
Var1,...,Var5):

> table(Dat$Var4,Dat$Var3,Dat$Var1,dnn=c("Var4","Var3","Var1"))
, , Var1 = 0

    Var3
Var4 1 2 3
   0 1 1 0
   1 2 1 0

, , Var1 = 1

    Var3
Var4 1 2 3
   0 1 0 2
   1 1 2 1

which is basicaloy a contingency table format already,

or counts and means by() with functions sun() and mean() to the
"continuous" variables, thus:

CT <- by(Dat,list(var1=Dat$Var1,Var3=Dat$Var3,Var4=Dat$Var4),
         function(x){list(Count=sum(x[,2]>0),Mean=mean(x[,c(2,5)]))})

which produces:

var1: 0
Var3: 1
Var4: 0
$Count
[1] 1

$Mean
Var2 Var5 
  11  144 

------------------------------------------------------------ 
var1: 1
Var3: 1
Var4: 0
$Count
[1] 1

$Mean
Var2 Var5 
  17  136 

------------------------------------------------------------ 
var1: 0
Var3: 2
Var4: 0
$Count
[1] 1

$Mean
Var2 Var5 
  18   99 

------------------------------------------------------------ 
var1: 1
Var3: 2
Var4: 0
NULL
------------------------------------------------------------ 
var1: 0
Var3: 3
Var4: 0
NULL
------------------------------------------------------------ 
var1: 1
Var3: 3
Var4: 0
$Count
[1] 2

$Mean
 Var2  Var5 
 13.5 116.5 

------------------------------------------------------------ 
var1: 0
Var3: 1
Var4: 1
$Count
[1] 2

$Mean
Var2 Var5 
  16  120 

------------------------------------------------------------ 
var1: 1
Var3: 1
Var4: 1
$Count
[1] 1

$Mean
Var2 Var5 
  10  109 

------------------------------------------------------------ 
var1: 0
Var3: 2
Var4: 1
$Count
[1] 1

$Mean
Var2 Var5 
  22  142 

------------------------------------------------------------ 
var1: 1
Var3: 2
Var4: 1
$Count
[1] 2

$Mean
 Var2  Var5 
 12.5 120.5 

------------------------------------------------------------ 
var1: 0
Var3: 3
Var4: 1
NULL
------------------------------------------------------------ 
var1: 1
Var3: 3
Var4: 1
$Count
[1] 1

$Mean
Var2 Var5 
  12  152 

but this format is not very convenient for incorporating into
a contingency table such as the one shown above (obtained by hand). 

Probably others can find a way to convert the above output from
CT into a contingency table.

However, unless you have a lot of these to do, it may be quicker
to do one, or a few, by hand!

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 06-May-07                                       Time: 02:57:12
------------------------------ XFMail ------------------------------