Fri Jul 24 00:54:34 CEST 2020

```On 23/07/2020 6:15 p.m., Sorkin, John wrote:
> The by function in the R program below is not giving me the sums
> I expect to see, viz.,
> 382+170=552
> 4730+170=4900
> 5+6=11
> 199+25=224
> ###################################################
> #full R program:
> mydata <- data.frame(covid=c(0,0,0,0,1,1,1,1),
> sex=(rep(c(1,1,0,0),2)),
> status=rep(c(1,0),2),
> values=c(382,4730,5,199,170,497,6,25))
> mydata
> by(mydata,list(mydata\$sex,mydata\$status),sum)
> by(mydata,list(mydata\$sex,mydata\$status),print)
> ###################################################

The problem is that you are summing the mydata values, not the
mydata\$values values.  That will include covid, sex and status in the
sums.  I think you'll get what you should (though it doesn't match what
you say you expected, which looks wrong to me) with this code:

by(mydata\$values,list(mydata\$sex,mydata\$status),sum)

for 0,0, the sum is 224 = 199+25
for 0,1, the sum is  11 = 5+6
for 1,0, the sum is 5227 = 4730 + 497 (not 4730 + 170)
for 1,1, the sum is 552 = 382 + 170

Duncan Murdoch

> More complete explanation of my question
> I have created a simple dataframe having three factors:
>   mydata <- data.frame(covid=c(0,0,0,0,1,1,1,1),
>   sex=(rep(c(1,1,0,0),2)),
>   status=rep(c(1,0),2),
>   values=c(382,4730,5,199,170,497,6,25))
>   > mydata
>    covid sex status values
> 1     0   1      1    382
> 2     0   1      0   4730
> 3     0   0      1      5
> 4     0   0      0    199
> 5     1   1      1    170
> 6     1   1      0    497
> 7     1   0      1      6
> 8     1   0      0     25
> When I use the by function with a sum as an argument, I don’t
> get the sums that I would expect to
> receive based either on the listing of the dataframe above,
> or from using by with print as an argument:
>> by(mydata,list(mydata\$sex,mydata\$status),sum)
> : 0
> : 0
> [1] 225
> -------------------------------------------------------------------------------
> : 1
> : 0
> [1] 5230
> -------------------------------------------------------------------------------
> : 0
> : 1
> [1] 14
> -------------------------------------------------------------------------------
> : 1
> : 1
> [1] 557
> I expected to see the following sums:
> 382+170=552
> 4730+170=4900
> 5+6=11
> 199+25=224
> Which as can be seen by the output above, I am not getting.
> Using print as an argument to the by function, I get the values
> grouped as I would expect, but for some reason I get a double
> printing of the values!
>
>> by(mydata,list(mydata\$sex,mydata\$status),print)
>    covid sex status values
> 4     0   0      0    199
> 8     1   0      0     25
>    covid sex status values
> 2     0   1      0   4730
> 6     1   1      0    497
>    covid sex status values
> 3     0   0      1      5
> 7     1   0      1      6
>    covid sex status values
> 1     0   1      1    382
> 5     1   1      1    170
> : 0
> : 0
>    covid sex status values
> 4     0   0      0    199
> 8     1   0      0     25
> -------------------------------------------------------------------------------
> : 1
> : 0
>    covid sex status values
> 2     0   1      0   4730
> 6     1   1      0    497
> -------------------------------------------------------------------------------
> : 0
> : 1
>    covid sex status values
> 3     0   0      1      5
> 7     1   0      1      6
> -------------------------------------------------------------------------------
> : 1
> : 1
>    covid sex status values
> 1     0   1      1    382
> 5     1   1      1    170
