[R] Applying function to multiple data

Thu Mar 3 16:14:53 CET 2011

Hi,

It might not be the best approach, but here is what I would do.

##########

1) If you have your data in 3 different data.frames:

#create a named list where each element is one of your data.frame
list_df <- vector(mode="list", length=3)
names(list_df) <- c("Bank", "Corporate", "Sovereign")

list_df[[1]] <- data.frame(k = c(1:8), ratings = c("A", "B", "C", "D", 
"E", "F", "G","H"), default_frequency = 
c(0.00229,0.01296,0.01794,0.04303,0.04641,0.06630,0.06862,0.06936))
list_df[[2]] <- data.frame(k = c(1:8), ratings = c("A", "B", "C", "D", 
"E", "F", "G","H"), default_frequency = 
c(0.00101,0.01433,0.02711,0.03701,0.04313,0.05600,0.06041,0.07112))
list_df[[3]] <- data.frame(k = c(1:8), ratings = c("A", "B", "C", "D", 
"E", "F", "G","H"), default_frequency = 
c(0.00210,0.01014,0.02001,0.04312,0.05114,0.06801,0.06997,0.07404))

#apply your function DP to each element of the list, i.e. to each 
data.frame:
out1 <- lapply(list_df, FUN=function(x) DP(k=x$k, 
ODF=x$default_frequency, ratings=x$ratings))

##########

2) If you have your data in a single data.frame, as it looks from your 
example, I would first fill all the cells, so that it looks like this:

df2 <- structure(list(Class = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), 
.Label = c("Bank", "Corporate", "Sovereign"), class = "factor"), k = 
c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L), rating = structure(c(1L, 2L, 3L, 4L, 5L, 
6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L), .Label = c("A", "B", "C", "D", "E", "F", "G", "H"), class = 
"factor"), default_frequency = c(0.00229, 0.01296, 0.01794, 0.04303, 
0.04641, 0.0663, 0.06862, 0.06936, 0.00101, 0.01433, 0.02711, 0.03701, 
0.04313, 0.056, 0.06041, 0.07112, 0.0021, 0.01014, 0.02001, 0.04312, 
0.05114, 0.06801, 0.06997, 0.07404)), .Names = c("Class", "k", 
"ratings", "default_frequency"), class = "data.frame", row.names = c(NA, 
-24L))

#then split by Class:
list_df2 <- split(df2, df2$Class)
#and apply as before:
out2 <- lapply(list_df2, FUN=function(x) DP(k=x$k, 
ODF=x$default_frequency, ratings=x$ratings))

#or in one step using plyr:
library(plyr)
out3 <- dlply(.data=df2, .variables="Class", .fun=function(x) DP(k=x$k, 
ODF=x$default_frequency, ratings=x$ratings))

##########

3) all solutions give the same results:

all.equal(out1, out2, check.attributes=FALSE)
[1] TRUE
all.equal(out1, out3, check.attributes=FALSE)
[1] TRUE
all.equal(out2, out3, check.attributes=FALSE)
[1] TRUE

HTH,
Ivan

Le 3/3/2011 11:06, Akshata Rao a écrit :
> Dear R helpers,
>
> I know R language at a preliminary level. This is my first post to this R
> forum. I have recently learned the use of function and have been successful
> in writing few on my own. However I am not able to figure out how to apply
> the function to multiple sets of data.
>
> # MY QUERY
>
> Suppose I am having following data.frame
>
> df = data.frame(k = c(1:8), ratings = c("A", "B", "C", "D", "E", "F", "G",
> "H"),
> default_frequency =
> c(0.00229,0.01296,0.01794,0.04303,0.04641,0.06630,0.06862,0.06936))
>
> # -------------------------------
>
> DP = function(k, ODF, ratings)
>
> {
>
> n<-  length(ODF)
> tot_klnODF<-  sum(k*log(ODF))
> tot_k<-  sum(k)
> tot_lnODF<-  sum(log(ODF))
> tot_k2<-  sum(k^2)
> slope<-  exp((n * tot_klnODF - tot_k * tot_lnODF)/(n * tot_k2 -
> tot_k^2))
> intercept<-  exp((tot_lnODF - log(slope)* tot_k)/n)
> IPD<-  intercept * slope^k
>
> return(data.frame(ratings = ratings, default_probability = round(IPD, digits
> = 4)))
>
> }
>
> result = DP(k = df$k, ODF = df$default_frequency, ratings = df$ratings)
>
> #
> ________________________________________________________________________________________
>
> The above code fetches me following result. However, I am dealing with only
> one set of data here as defined in 'df'.
>
>> result
>    ratings default_probability
> 1       A              0.0061
> 2       B              0.0094
> 3       C              0.0145
> 4       D              0.0222
> 5       E              0.0342
> 6       F              0.0527
> 7       G              0.0810
> 8       H              0.1247
>
>
> # MY PROBLEM
>
> Suppose I have data as given below
>
> Class            k      rating      default_frequency
> Bank            1         A            0.00229
>                     2         B             0.01296
>                     3         C             0.01794
>                     4         D             0.04303
>                     5         E             0.04641
>                     6         F             0.06630
>                    7         G             0.06862
>                    8         H             0.06936
> Corporate    1         A             0.00101
>                    2         B             0.01433
>                    3         C             0.02711
>                    4         D             0.03701
>                    5         E             0.04313
>                    6         F             0.05600
>                    7         G             0.06041
>                    8         H             0.07112
> Sovereign    1         A             0.00210
>                    2         B             0.01014
>                    3         C             0.02001
>                    4         D             0.04312
>                    5         E             0.05114
>                    6         F             0.06801
>                    7         G             0.06997
>                    8         H             0.07404
>
> So I need to use the function "DP" defined above to generate three sets of
> results viz. for Bank, Corporate, Sovereign and save each of these results
> as diffrent csv files say as bank.csv, corporate.csv etc. Again please note
> that there could be say 'm' number of classes. I was trying to use the apply
> function but things are not working for me. I will really apprecaite the
> guidenace. I hope I am able to put up my query in a neat manner.
>
> Regards and thanking you all in advance.
>
> Akshata Rao
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calandra at uni-hamburg.de

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php