[R] Applying function to multiple data
Ivan Calandra
ivan.calandra at uni-hamburg.de
Thu Mar 3 16:14:53 CET 2011
Hi,
It might not be the best approach, but here is what I would do.
##########
1) If you have your data in 3 different data.frames:
#create a named list where each element is one of your data.frame
list_df <- vector(mode="list", length=3)
names(list_df) <- c("Bank", "Corporate", "Sovereign")
list_df[[1]] <- data.frame(k = c(1:8), ratings = c("A", "B", "C", "D",
"E", "F", "G","H"), default_frequency =
c(0.00229,0.01296,0.01794,0.04303,0.04641,0.06630,0.06862,0.06936))
list_df[[2]] <- data.frame(k = c(1:8), ratings = c("A", "B", "C", "D",
"E", "F", "G","H"), default_frequency =
c(0.00101,0.01433,0.02711,0.03701,0.04313,0.05600,0.06041,0.07112))
list_df[[3]] <- data.frame(k = c(1:8), ratings = c("A", "B", "C", "D",
"E", "F", "G","H"), default_frequency =
c(0.00210,0.01014,0.02001,0.04312,0.05114,0.06801,0.06997,0.07404))
#apply your function DP to each element of the list, i.e. to each
data.frame:
out1 <- lapply(list_df, FUN=function(x) DP(k=x$k,
ODF=x$default_frequency, ratings=x$ratings))
##########
2) If you have your data in a single data.frame, as it looks from your
example, I would first fill all the cells, so that it looks like this:
df2 <- structure(list(Class = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
.Label = c("Bank", "Corporate", "Sovereign"), class = "factor"), k =
c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L), rating = structure(c(1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L), .Label = c("A", "B", "C", "D", "E", "F", "G", "H"), class =
"factor"), default_frequency = c(0.00229, 0.01296, 0.01794, 0.04303,
0.04641, 0.0663, 0.06862, 0.06936, 0.00101, 0.01433, 0.02711, 0.03701,
0.04313, 0.056, 0.06041, 0.07112, 0.0021, 0.01014, 0.02001, 0.04312,
0.05114, 0.06801, 0.06997, 0.07404)), .Names = c("Class", "k",
"ratings", "default_frequency"), class = "data.frame", row.names = c(NA,
-24L))
#then split by Class:
list_df2 <- split(df2, df2$Class)
#and apply as before:
out2 <- lapply(list_df2, FUN=function(x) DP(k=x$k,
ODF=x$default_frequency, ratings=x$ratings))
#or in one step using plyr:
library(plyr)
out3 <- dlply(.data=df2, .variables="Class", .fun=function(x) DP(k=x$k,
ODF=x$default_frequency, ratings=x$ratings))
##########
3) all solutions give the same results:
all.equal(out1, out2, check.attributes=FALSE)
[1] TRUE
all.equal(out1, out3, check.attributes=FALSE)
[1] TRUE
all.equal(out2, out3, check.attributes=FALSE)
[1] TRUE
HTH,
Ivan
Le 3/3/2011 11:06, Akshata Rao a écrit :
> Dear R helpers,
>
> I know R language at a preliminary level. This is my first post to this R
> forum. I have recently learned the use of function and have been successful
> in writing few on my own. However I am not able to figure out how to apply
> the function to multiple sets of data.
>
> # MY QUERY
>
> Suppose I am having following data.frame
>
> df = data.frame(k = c(1:8), ratings = c("A", "B", "C", "D", "E", "F", "G",
> "H"),
> default_frequency =
> c(0.00229,0.01296,0.01794,0.04303,0.04641,0.06630,0.06862,0.06936))
>
> # -------------------------------
>
> DP = function(k, ODF, ratings)
>
> {
>
> n<- length(ODF)
> tot_klnODF<- sum(k*log(ODF))
> tot_k<- sum(k)
> tot_lnODF<- sum(log(ODF))
> tot_k2<- sum(k^2)
> slope<- exp((n * tot_klnODF - tot_k * tot_lnODF)/(n * tot_k2 -
> tot_k^2))
> intercept<- exp((tot_lnODF - log(slope)* tot_k)/n)
> IPD<- intercept * slope^k
>
> return(data.frame(ratings = ratings, default_probability = round(IPD, digits
> = 4)))
>
> }
>
> result = DP(k = df$k, ODF = df$default_frequency, ratings = df$ratings)
>
> #
> ________________________________________________________________________________________
>
> The above code fetches me following result. However, I am dealing with only
> one set of data here as defined in 'df'.
>
>> result
> ratings default_probability
> 1 A 0.0061
> 2 B 0.0094
> 3 C 0.0145
> 4 D 0.0222
> 5 E 0.0342
> 6 F 0.0527
> 7 G 0.0810
> 8 H 0.1247
>
>
> # MY PROBLEM
>
> Suppose I have data as given below
>
> Class k rating default_frequency
> Bank 1 A 0.00229
> 2 B 0.01296
> 3 C 0.01794
> 4 D 0.04303
> 5 E 0.04641
> 6 F 0.06630
> 7 G 0.06862
> 8 H 0.06936
> Corporate 1 A 0.00101
> 2 B 0.01433
> 3 C 0.02711
> 4 D 0.03701
> 5 E 0.04313
> 6 F 0.05600
> 7 G 0.06041
> 8 H 0.07112
> Sovereign 1 A 0.00210
> 2 B 0.01014
> 3 C 0.02001
> 4 D 0.04312
> 5 E 0.05114
> 6 F 0.06801
> 7 G 0.06997
> 8 H 0.07404
>
> So I need to use the function "DP" defined above to generate three sets of
> results viz. for Bank, Corporate, Sovereign and save each of these results
> as diffrent csv files say as bank.csv, corporate.csv etc. Again please note
> that there could be say 'm' number of classes. I was trying to use the apply
> function but things are not working for me. I will really apprecaite the
> guidenace. I hope I am able to put up my query in a neat manner.
>
> Regards and thanking you all in advance.
>
> Akshata Rao
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calandra at uni-hamburg.de
**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
More information about the R-help
mailing list