[R] Populate one data frame with values from another dataframe for rows that match

Eric Berger ericjberger at gmail.com
Sat Oct 14 11:42:34 CEST 2017


Hi Kevin,
I think there are issues with Rui's proposed solution. For example, if
there are rows in myDF1 which have a studyno
which does not match any row in myDF2, then you will lose those rows. In
your original request you said that you wanted to keep those rows.

To demonstrate my point I need to modify your sample data. Specifically, I
changed some studyno settings in myDF1, and also the entries of pf_mcl in
myDF1.

myDF1 <- structure(list(studyno = c("J1000/8", "J1000/9", "J1000/9",
"J1000/9",
"J1000/5", "J1000/6"), date = structure(c(17123, 17127, 17135,
17144, 17148, 17155), class = "Date"), pf_mcl = c(1:6
), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno",
"date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")

myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6",
"J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names =
c("studyno",
"pf_mcl"), row.names = c(NA, 6L), class = "data.frame")

#Rui's proposal gives the following result
# studyno       date year pf_mcl
# 1 J1000/9 2016-11-22 2016      0
# 2 J1000/9 2016-11-30 2016      0
# 3 J1000/9 2016-12-09 2016      0

My proposal

library(dplyr)

myDF1$studyno <- as.character(myDF1$studyno)
myDF2$studyno <- as.character(myDF2$studyno)
myDF3 <- merge(myDF1, myDF2, by="studyno", all.x=TRUE ) %>%
                dplyr::mutate( pf_mcl = ifelse( is.na(pf_mcl.y), pf_mcl.x,
pf_mcl.y ) ) %>%
                dplyr::select( studyno, date, pf_mcl )

# The results of this approach
#  studyno       date pf_mcl
# 1 J1000/5 2016-12-13      5
# 2 J1000/6 2016-12-20      6
# 3 J1000/8 2016-11-18      1
# 4 J1000/9 2016-11-22      0
# 5 J1000/9 2016-11-30      0
# 6 J1000/9 2016-12-09      0

Comparing the two results you see that no rows have been dropped in my
approach.

HTH,

Eric





On Sat, Oct 14, 2017 at 8:49 AM, Kevin Wamae <KWamae at kemri-wellcome.org>
wrote:

> Dear @Rui Barradas, thank you for the solution. It works perfectly.
>
>
> On 13/10/2017, 23:35, "Rui Barradas" <ruipbarradas at sapo.pt> wrote:
>
>     Hello,
>
>     Try the following.
>
>
>     myDF1$studyno <- as.character(myDF1$studyno)
>     myDF2$studyno <- as.character(myDF2$studyno)
>     i1 <- which(names(myDF1) == "pf_mcl")
>
>     merge(myDF1[-i1], myDF2, by = "studyno")
>
>
>     Hope this helps,
>
>     Rui Barradas
>
>     Em 13-10-2017 20:09, Kevin Wamae escreveu:
>     > I'm trying to populate the column “pf_mcl” in myDF1 with values from
> myDF2, where rows match based on column "studyno" but the solutions I have
> found so far don't seem to be giving me the desired output.
>     >
>     > Below is a snapshot of the data.frames.
>     >
>     > myDF1 <- structure(list(studyno = c("J1000/9", "J1000/9", "J1000/9",
> "J1000/9",
>     > "J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135,
>     > 17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_,
>     > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
>     > ), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names =
> c("studyno",
>     > "date", "pf_mcl", "year"), row.names = c(NA, 6L), class =
> "data.frame")
>     >
>     > myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7",
> "J931/6",
>     > "J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names =
> c("studyno",
>     > "pf_mcl"), row.names = c(NA, 6L), class = "data.frame")
>     >
>     > myDF2 is a well curated subset of myDF1. Some rows in the two
> datasets match based on "studyno", one may find that values are missing in
> myDF1$pf_mcl or the values are wrong.
>     >
>     > All I want to do is identify a matching row in myDF2 and populate
> myDF1$pf_mcl with the value in myDF2$pf_mcl. If a row does not match based
> on “studyno”, the value should remain the same.
>     >
>     > It's probably worth mentioning, the two data frames have other
> columns...I have selected a few for example purposes.
>     >
>     >
>     >
>     > ____________________________________________________________
> __________
>     >
>     > This e-mail contains information which is confidential. It is
> intended only for the use of the named recipient. If you have received this
> e-mail in error, please let us know by replying to the sender, and
> immediately delete it from your system.  Please note, that in these
> circumstances, the use, disclosure, distribution or copying of this
> information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot
> accept any responsibility for the  accuracy or completeness of this message
> as it has been transmitted over a public network. Although the Programme
> has taken reasonable precautions to ensure no viruses are present in
> emails, it cannot accept responsibility for any loss or damage arising from
> the use of the email or attachments. Any views expressed in this message
> are those of the individual sender, except where the sender specifically
> states them to be the views of KEMRI-Wellcome Trust Programme.
>     > ____________________________________________________________
> __________
>     >
>     >   [[alternative HTML version deleted]]
>     >
>     > ______________________________________________
>     > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>     > https://stat.ethz.ch/mailman/listinfo/r-help
>     > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
>     > and provide commented, minimal, self-contained, reproducible code.
>     >
>
>
>
> ______________________________________________________________________
>
> This e-mail contains information which is confidential. It is intended
> only for the use of the named recipient. If you have received this e-mail
> in error, please let us know by replying to the sender, and immediately
> delete it from your system.  Please note, that in these circumstances, the
> use, disclosure, distribution or copying of this information is strictly
> prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility
> for the  accuracy or completeness of this message as it has been
> transmitted over a public network. Although the Programme has taken
> reasonable precautions to ensure no viruses are present in emails, it
> cannot accept responsibility for any loss or damage arising from the use of
> the email or attachments. Any views expressed in this message are those of
> the individual sender, except where the sender specifically states them to
> be the views of KEMRI-Wellcome Trust Programme.
> ______________________________________________________________________
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list