[R] Help to check data before putting it in a database

Joshua Wiley jwiley.psych at gmail.com
Tue Apr 5 17:21:41 CEST 2011

Hi Ulisses,

Look at the functions ?match and ?rbind

If you do not want to do it by hand, you can make a little function as below.



d1 <- data.frame(goals = 4:1, players = LETTERS[1:4])
d2 <- data.frame(goals = c(1, 3, 2, 5), players = LETTERS[3:6])

f <- function(old, new, check) {
  index <- new[, check] %in% old[, check]
  dat <- rbind(old, new[index, ])
  tocheck <- new[!index, ]
  list(merged = dat, tocheck = tocheck)

dmerged <- f(d1, d2, "players")
## check "tocheck" and once it is correct
dfinal <- do.call("rbind", dmerged)

On Tue, Apr 5, 2011 at 8:06 AM, Ulisses.Camargo
<moliterno.camargo at gmail.com> wrote:
> The example scene:
> I have a database with stats about each goal made by my soccer team. This
> database (a data frame in R) is organized in lines (goals) with a set of
> columns containing data about these goals (player name, tactic position,
> etc). For now, this database will be called "data.frame1".
> What I need is to feed this "data.frame1" with new information about my team
> goals. I will call this new information "data.frame2". This set of new goals
> is organized in the same way as in "data.frame1" (equal numbers of cols).
> Where help is needed:
> I need help in finding a way to check the player-name column in
> "data.frame2" before feeding "data.frame1" with it. What I need is a way to
> verify the name of the player on each line of "data.frame2" with the names
> of players that already exist on a col in "data.frame1". Moreover, I need R
> to make two main things:
> First, the lines of “data.frame2” with player names that already exists in
> “data.frame1” must be added to “data.frame1”.
> Second: lines of “data.frame2” with player names that does not exist on
> “data.frame1” must be listed in an output to be manually checked and
> corrected.
> After this verification, corrected lines and new-player-names lines must be
> incorporated in "data.frame1".
> What I want is to guarantee that will be no lines with wrong player names in
> my database.
> At the same time, my script must permit new information to be added (new
> player names).
> Is there somebody who could help me with this?
> Thanks for your attention
> Best wishes
> Ulisses
> --
> View this message in context: http://r.789695.n4.nabble.com/Help-to-check-data-before-putting-it-in-a-database-tp3428318p3428318.html
> Sent from the R help mailing list archive at Nabble.com.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles

More information about the R-help mailing list