[R] Band-wise Sum

Fri Aug 27 16:36:24 CEST 2010

On Aug 27, 2010, at 9:49 AM, Vincy Pyne wrote:

> Hi
>
> I have a large credit portfolio (exceeding 50000 borrowers). For  
> particular process I need to add up the exposures based on the  
> bands. I am giving a small test data below.

I would think that cut() would be the accepted method for defining a  
factor variable based on specified cutpoints. If you then wanted to  
see what the cumsum() was across the range of possible levels, that to  
would be a fairly simple task.

df$ead.cat <- cut(df$ead, breaks=c(0, 100000, 500000, 1000000,  
2000000, 5000000 , 10000000, 100000000) )
df
with(df, tapply(ead.cat, rating, length))
#  A  AA AAA   B  BB BBB
# 10   8   2   1   4   7
with(df, tapply(ead.cat, rating, table))
# returns a list of table objects by bond rating

lapply( with(df, tapply(ead.cat, rating, table)) , cumsum)
#returns the cumsum of those tables

# sapply gives a more compact output of that result:
  sapply( with(df, tapply(ead.cat, rating, table)) , cumsum)
                A AA AAA B BB BBB
(0,1e+05]      4  2   1 0  3   1
(1e+05,5e+05]  8  2   1 1  3   1
(5e+05,1e+06]  9  2   1 1  3   1
(1e+06,2e+06]  9  4   2 1  4   3
(2e+06,5e+06]  9  5   2 1  4   4
(5e+06,1e+07] 10  5   2 1  4   7
(1e+07,1e+08] 10  8   2 1  4   7

Loops, you say we need loops? We don't need no stinkin' loops.

-- 
David.

>
> rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA",  
> "AA", "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB",  
> "A", "BB", "A", "AA", "B","A", "AA", "BBB", "A", "BBB")
>
> ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000,  
> 1028360,  6000000, 17715000,  14430325.24, 1180946.57, 150000,  
> 167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3,  
> 3283048.61, 1200000, 11800, 3000,  96894.02,  453671.72,  7590,  
> 106065.24, 940711.67,  2443000, 9500000, 39000, 1501939.67)
>
> ## First I have sorted the data rating-wise as
>
> df <- data.frame(rating, ead)
>
> df_sorted <-
> df[order(df$rating),]
>
> df_sorted_AAA <- subset(df_sorted, rating=="AAA")
> df_sorted_AA <- subset(df_sorted, rating=="AA")
> df_sorted_A <- subset(df_sorted, rating=="A")
> df_sorted_BBB <- subset(df_sorted, rating=="BBB")
> df_sorted_BB <- subset(df_sorted, rating=="BB")
> df_sorted_B <- subset(df_sorted, rating=="B")
> df_sorted_CCC <- subset(df_sorted, rating=="CCC")
>
> ## we begin with BBB rating. The R output for df_sorted_BBB is as  
> follows
>
>> df_sorted_BBB
>       rating      ead
> 4     BBB      9530149
> 8     BBB      6000000
> 16    BBB     3000
> 20    BBB     3283049
> 21    BBB     1200000
> 30    BBB     9500000
> 32    BBB     1501940
>
> My problem is I need to totals of eads falling in the respective bands
>
> I
> am defining bands in millions as
>
> seq_BBB <- seq(1000000, max(df_sorted_BBB$ead), by = 1000000)
>
> # The output is
> [1] 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06
>
> So for the sub data pertaining to Rating "BBB", I want corresponding  
> ead totals i.e. I want ead totals where ead < 1e+06, then I want ead  
> totals where 1+e06 < ead < 2e+06, 2e+06 < ead < 3e+06 ...and so on.
>
> I have tried the following code
>
> s_BBB <- NULL
>
> for (i in 1:length(s_BBB))
> {
> s_BBB[i] = sum(subset(df_sorted_BBB$ead, df_sorted_BBB$ead <  
> s_BBB[i]))
> }
>
> I was trying to find totals ofads < 1e+06, ead < 2e+06, ead<3e+06and  
> so on.
>
> but the result is
>
>> s_BBB
> [1] 0
>
>
> I apologize if I am not able to express my problem properly. My only  
> objective is first to sort the whole portfolio rating-wise and then  
> within each of these rating-wise sorted data, I wish to find out  
> total of eads based
> on various bands starting <1000000,  1000000 - 200000, 2000000 -  
> 3000000, 3000000 - 4000000 and so on. Since the database contains  
> more than 50000 records, various ead amounts ranging from few 000's  
> to billion are available.
>
> Please guide
>
> Thanking  you all in advance
>
> Vincy
>
>
>
>
>
>
>
>
>
>
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT