[R] by function with sum does not give what is expected from by function with print
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Fri Jul 24 06:52:03 CEST 2020
Hello,
These two gives the same results:
aggregate(values ~ sex + status, mydata, sum)
# sex status values
#1 0 0 224
#2 1 0 5227
#3 0 1 11
#4 1 1 552
by(mydata$values, list(mydata$sex, mydata$status), sum)
#: 0
#: 0
#[1] 224
#------------------------------------------------------------
#: 1
#: 0
#[1] 5227
#------------------------------------------------------------
#: 0
#: 1
#[1] 11
#------------------------------------------------------------
#: 1
#: 1
#[1] 552
So Duncan is right, your expected output's 2nd sum is wrong, the right
sum is
mydata rows 2 and 6: 4730 + 497 == 5227
----------------------------------------^
Another option, returning a matrix,
tapply(mydata$values, list(mydata$sex, mydata$status), sum)
# 0 1
#0 224 11
#1 5227 552
Hope this helps,
Rui Barradas
Às 23:15 de 23/07/2020, Sorkin, John escreveu:
> Colleagues,
>
> The by function in the R program below is not giving me the sums
> I expect to see, viz.,
> 382+170=552
> 4730+170=4900
> 5+6=11
> 199+25=224
> ###################################################
> #full R program:
> mydata <- data.frame(covid=c(0,0,0,0,1,1,1,1),
> sex=(rep(c(1,1,0,0),2)),
> status=rep(c(1,0),2),
> values=c(382,4730,5,199,170,497,6,25))
> mydata
> by(mydata,list(mydata$sex,mydata$status),sum)
> by(mydata,list(mydata$sex,mydata$status),print)
> ###################################################
>
> More complete explanation of my question
>
> I have created a simple dataframe having three factors:
> mydata <- data.frame(covid=c(0,0,0,0,1,1,1,1),
> sex=(rep(c(1,1,0,0),2)),
> status=rep(c(1,0),2),
> values=c(382,4730,5,199,170,497,6,25))
>
> > mydata
> covid sex status values
> 1 0 1 1 382
> 2 0 1 0 4730
> 3 0 0 1 5
> 4 0 0 0 199
> 5 1 1 1 170
> 6 1 1 0 497
> 7 1 0 1 6
> 8 1 0 0 25
>
> When I use the by function with a sum as an argument, I don’t
> get the sums that I would expect to
> receive based either on the listing of the dataframe above,
> or from using by with print as an argument:
>
>> by(mydata,list(mydata$sex,mydata$status),sum)
> : 0
> : 0
> [1] 225
> -------------------------------------------------------------------------------
> : 1
> : 0
> [1] 5230
> -------------------------------------------------------------------------------
> : 0
> : 1
> [1] 14
> -------------------------------------------------------------------------------
> : 1
> : 1
> [1] 557
>
> I expected to see the following sums:
> 382+170=552
> 4730+170=4900
> 5+6=11
> 199+25=224
> Which as can be seen by the output above, I am not getting.
>
> Using print as an argument to the by function, I get the values
> grouped as I would expect, but for some reason I get a double
> printing of the values!
>
>> by(mydata,list(mydata$sex,mydata$status),print)
> covid sex status values
> 4 0 0 0 199
> 8 1 0 0 25
> covid sex status values
> 2 0 1 0 4730
> 6 1 1 0 497
> covid sex status values
> 3 0 0 1 5
> 7 1 0 1 6
> covid sex status values
> 1 0 1 1 382
> 5 1 1 1 170
> : 0
> : 0
> covid sex status values
> 4 0 0 0 199
> 8 1 0 0 25
> -------------------------------------------------------------------------------
> : 1
> : 0
> covid sex status values
> 2 0 1 0 4730
> 6 1 1 0 497
> -------------------------------------------------------------------------------
> : 0
> : 1
> covid sex status values
> 3 0 0 1 5
> 7 1 0 1 6
> -------------------------------------------------------------------------------
> : 1
> : 1
> covid sex status values
> 1 0 1 1 382
> 5 1 1 1 170
>
> What am I doing wrong, or what don’t I understand
> About the by function?
>
> Thank you
> John
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> John David Sorkin M.D., Ph.D.
>
> Professor of Medicine
>
> Chief, Biostatistics and Informatics
>
> University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
>
> Baltimore VA Medical Center
>
> 10 North Greene Street
>
> GRECC (BT/18/GR)
>
> Baltimore, MD 21201-1524
>
> (Phone) 410-605-7119
>
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus
More information about the R-help
mailing list