[R] Create new data frame with conditional sums

Leonard Mada |eo@m@d@ @end|ng |rom @yon|c@eu
Mon Oct 16 13:41:48 CEST 2023

Dear Jason,

The code could look something like:

dummyData = data.frame(Tract=seq(1, 10, by=1),
     Pct = c(0.05,0.03,0.01,0.12,0.21,0.04,0.07,0.09,0.06,0.03),
     Totpop = c(4000,3500,4500,4100,3900,4250,5100,4700,4950,4800))

# Define the cutoffs
# - allow for duplicate entries;
by = 0.03; # by = 0.01;
cutoffs <- seq(0, 0.20, by = by)

# Create a new column with cutoffs
dummyData$Cutoff <- cut(dummyData$Pct, breaks = cutoffs,
     labels = cutoffs[-1], ordered_result = TRUE)

# Sort data
# - we could actually order only the columns:
#   Totpop & Cutoff;
dummyData = dummyData[order(dummyData$Cutoff), ]

# Result
cs = cumsum(dummyData$Totpop)

# Only last entry:
# - I do not have a nice one-liner, but this should do it:
isLast = rev(! duplicated(rev(dummyData$Cutoff)))

data.frame(Total = cs[isLast],
     Cutoff = dummyData$Cutoff[isLast])



On 10/15/2023 7:41 PM, Leonard Mada wrote:
> Dear Jason,
> I do not think that the solution based on aggregate offered by GPT was 
> correct. That quasi-solution only aggregates for every individual level.
> As I understand, you want the cumulative sum. The idea was proposed by 
> Bert; you need only to sort first based on the cutoff (e.g. using an 
> ordered factor). And then only extract the last value for each level. 
> If Pct is unique, than you can skip this last step and use directly 
> the cumsum (but on the sorted data set).
> Alternatives: see the solutions with loops or with sapply.
> Sincerely,
> Leonard

More information about the R-help mailing list