[R] Create new data frame with conditional sums

peter dalgaard pd@|gd @end|ng |rom gm@||@com
Tue Oct 24 13:56:25 CEST 2023


This seems to work. A couple of fine points, including handling duplicated Pct values right, which is easier if you do the reversed cumsum.

> dd2 <- dummydata[order(dummydata$Pct),]
> dd2$Cum <- rev(cumsum(rev(dd2$Totpop)))
> use <- !duplicated(dd2$Pct)
> approx(dd2$Pct[use], dd2$Cum[use], ctof, method="constant", f=1, rule=2)
$x
 [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14
[16] 0.15

$y
 [1] 43800 43800 39300 39300 31000 26750 22750 17800 12700 12700  8000  8000
[13]  8000  3900  3900  3900


> On 14 Oct 2023, at 17:10 , Bert Gunter <bgunter.4567 using gmail.com> wrote:
> 
> Well, here's one way to do it:
> (dat is your example data frame)
> 
> Cutoff <- seq(0, .15, .01)
> Pop <- with(dat, sapply(Cutoff, \(p)sum(Totpop[Pct >= p])))
> 
> I think there must be a more efficient way to do it with cumsum(), though.
> 
> Cheers,
> Bert
> 
> On Sat, Oct 14, 2023 at 12:53 AM Jason Stout, M.D. <jason.stout using duke.edu> wrote:
>> 
>> This seems like it should be simple but I can't get it to work properly.  I'm starting with a data frame like this:
>> 
>> Tract      Pct          Totpop
>> 1              0.05        4000
>> 2              0.03        3500
>> 3              0.01        4500
>> 4              0.12        4100
>> 5              0.21        3900
>> 6              0.04        4250
>> 7              0.07        5100
>> 8              0.09        4700
>> 9              0.06        4950
>> 10           0.03        4800
>> 
>> And I want to end up with a data frame with two columns, a "Cutoff" column that is a simple sequence of equally spaced cutoffs (let's say in this case from 0-0.15 by 0.01) and a "Pop" column which equals the sum of "Totpop" in the prior data frame in which "Pct" is greater than or equal to "cutoff."  So in this toy example, this is what I want for a result:
>> 
>>   Cutoff   Pop
>> 1    0.00 43800
>> 2    0.01 43800
>> 3    0.02 39300
>> 4    0.03 39300
>> 5    0.04 31000
>> 6    0.05 26750
>> 7    0.06 22750
>> 8    0.07 17800
>> 9    0.08 12700
>> 10   0.09 12700
>> 11   0.10  8000
>> 12   0.11  8000
>> 13   0.12  8000
>> 14   0.13  3900
>> 15   0.14  3900
>> 16   0.15  3900
>> 
>> I can do this with a for loop but it seems there should be an easier, vectorized way that would be more efficient.  Here is a reproducible example:
>> 
>> dummydata<-data.frame(Tract=seq(1,10,by=1),Pct=c(0.05,0.03,0.01,0.12,0.21,0.04,0.07,0.09,0.06,0.03),Totpop=c(4000,3500,4500,4100,
>>                                                                                                             3900,4250,5100,4700,
>>                                                                                                             4950,4800))
>> dfrm<-data.frame(matrix(ncol=2,nrow=0,dimnames=list(NULL,c("Cutoff","Pop"))))
>> for (i in seq(0,0.15,by=0.01)) {
>> temp<-sum(dummydata[dummydata$Pct>=i,"Totpop"])
>> dfrm[nrow(dfrm)+1,]<-c(i,temp)
>> }
>> 
>> Jason Stout, MD, MHS
>> Division of Infectious Diseases
>> Dept of Medicine
>> Duke University
>> Box 102359-DUMC
>> Durham, NC 27710
>> FAX 919-681-7494
>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com



More information about the R-help mailing list