[R] Best way/practice to create a new data frame from two given ones with last column computed from the two data frames?
Marius Hofert
m_hofert at web.de
Fri Aug 19 00:00:52 CEST 2011
Dear all,
okay, I found a one liner based on mutate:
(df3 <- mutate(df1, Value=Value[order(Year,Group)] / df2[with(df2, order(Year,Group)),"Value"]))
Cheers,
Marius
On 2011-08-18, at 20:41 , Marius Hofert wrote:
> Dear expeRts,
>
> What is the best approach to create a third data frame from two given ones, when
> the new/third data frame has last column computed from the last columns of the two given
> data frames?
>
> ## Okay, sounds complicated, so here is an example. Assume we have the two data frames:
> df1 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group 2"), Value=1:20)
> df2 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group 2"), Value=21:40)
>
> ## To make this a bit more fun, let's say the order of elements is different...
> (df1 <- df1[sample(1:nrow(df1)),])
> (df2 <- df2[sample(1:nrow(df2)),])
>
> ## Now I would like to create a third data frame that has "Year" in column one,
> ## "Group" in column two, and each entry of column three should consist of the
> ## corresponding entry in df1 divided by the one in df2.
>
> ## To achieve this, one could do:
> df3 <- df1[with(df1, order(Year,Group)),]
> df3$Value <- df3$Value/df2[with(df2, order(Year,Group)),]$Value
> colnames(df3)[3] <- "New Value" # typically, the column name changes
>
> ## or one could do:
> df3 <- df1[with(df1, order(Year,Group)), -ncol(df1)]
> df3 <- cbind(df3, "New Value"=df1[with(df1, order(Year,Group)),]$Value/df2[with(df2, order(Year,Group)),]$Value)
>
> ## Is there a more elegant solution? (maybe with ddply?)
>
> ## By the way:
> df1[,"Value"] # works
> df1[,-"Value"] # does not work
> ## Is there a way to exclude columns by names? that would make the code more readable.
> ## I know one could use...
> subset(df1, select=c("Year","Group"))
> ## ... but it seems a bit tedious if you have lots of columns to first remove the
> ## column name that should be dropped and then put the remaining column names in "select"
>
>
> Cheers,
>
> Marius
>
>
>
More information about the R-help
mailing list