[R] How to create an ifelse statement where it matches a different data.frame variable
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Thu Mar 27 03:10:14 CET 2014
Please keep the mailing list included by using "reply-all"... I am not
doing this as a private consultation.
Your sample data is a step forward, but it is still not reproducible. You
could Google "R reproducible example" and find
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example.
Try running the following R code, and decipher each step using the R help
documentation (such as typing ?dput at the R command line) and the
Introduction to R document (particularly about indexing):
# What you should have provided
# dput( PastData )
PastData <- structure(list(Name = c("aaa", "ccc", "ddd"), Code = c(1L, 3L,
4L)), .Names = c("Name", "Code"), class = "data.frame", row.names = c(NA,
-3L))
# dput( CurrentData )
CurrentData <- structure(list(Name = c("aaa", "bbb", NA, "ddd"), Code =
1:4), .Names = c("Name", "Code"), class = "data.frame", row.names = c(NA,
-4L))
# Want to fix CurrentData to be like NewData
# dput( NewData )
NewData <- structure(list(Name = c("aaa", "bbb", "ccc", "ddd"), Code =
1:4), .Names = c("Name", "Code"), row.names = c(NA, -4L), class =
"data.frame")
# What the answer might look like if you had provided the above
# Learning sequence... what is the current code vector?
CurrentData$Code
# Which indexes in PastData have the codes from CurrentData?
match( CurrentData$Code, PastData$Code )
# How do we look up the corresponding Name values?
PastData[ match( CurrentData$Code, PastData$Code ), "Name" ]
# At this point, we have a vector of names from PastData corresponding to
# codes in CurrentData
# Pick only those names from PastData where the CurrentData$Name is NA
ifelse( is.na( CurrentData$Name ), PastData[ match( CurrentData$Code,
PastData$Code ), "Name" ], CurrentData$Name )
# Proposed new data frame
MyNewData <- CurrentData
MyNewData$Name <- ifelse( is.na( CurrentData$Name ), PastData[ match(
CurrentData$Code, PastData$Code ), "Name" ], CurrentData$Name )
# Check whether we achieved your goal
identical( MyNewData, NewData )
---
Please note that your data frame might already have converted the Name
column to a factor, so the above code might need to be adapted or you
might be better off re-importing your data so that the Name column is a
character vector instead of a factor. If you had read about and created a
reproducible example we would already know whether this was going to be a
problem.
On Wed, 26 Mar 2014, Megan Weigel wrote:
> I apologize... I will make time to read Posting Guide.
>
> The date looks similar to the following:
>
> PastData:
> Name Code
> aaa 1
> ccc 3
> ddd 4
>
> CurrentData:
> Name Code
> aaa 1
> bbb 2
> NA 3
> ddd 4
>
> It should look like this...
>
> NewData:
> Name Code
> aaa 1
> bbb 2
> ccc 3
> ddd 4
>
> The code has to replace NA with the name that corresponds to the same code number in the PastData. They
> also do not have the same number of rows.
>
> Thank you very much,
>
> Johnson
>
>
> On Wed, Mar 26, 2014 at 1:36 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:
> Please read and in the future follow the Posting Guide, which requests that you provide a
> reproducible example... that is, a series of R statements that we can run to get us to your
> problem point with a small sample data set that resembles yours. Forging on anyway...
>
> The ifelse function applies to vectors, not data frames. That is, as long as both data
> frames have the same number of rows, you should be able to do things like
>
> CurrentDataFrame$Name <- ifelse( CurrentDataFrame$Name=="NA", PastDataFrame$Name,
> CurrentDataFrame$Name)
>
> Please note that NA is completely different than "NA" (read the Introduction to R document
> that comes with R if you need a refresher). If you are really trying to weed out NA values
> then you would need to do something like
>
> CurrentDataFrame$Name <- ifelse( is.na(CurrentDataFrame$Name), PastDataFrame$Name,
> CurrentDataFrame$Name)
>
> Also, if you need speed or are pushing the limits of your RAM, the following approach
> avoids replacing the entire vector:
>
> idx <- is.na(CurrentDataFrame$Name)
> CurrentDataFrame[idx,"Name"] <- PastDataFrame[idx,"Name"]
>
> ---------------------------------------------------------------------------
> Jeff Newmiller The ..... ..... Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> On March 26, 2014 9:44:06 AM PDT, Megan Weigel <mw5wags at gmail.com> wrote:
> >Hello,
> >
> >Hopefully there is an answer for this, but I need an ifelse statement
> >that
> >replaces and returns a value based on a different dataframe. For
> >example:
> >
> >CurrentDataFrame<-ifelse(CurrentDataFrame$Name=="NA",match(CurrentDataFrame$Code
> >with PastDataFrame$Code),replace(CurrentDataFrame$Name) with
> >(PastDataFrame$Name)
> >
> >
> >I hope that makes sense.
> >
> >Thank you very much,
> >
> >Johnson
> >
> > [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help at r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> Megan Weigel
> (865)924-2124
> mw5wags at gmail.com
>
>
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
More information about the R-help
mailing list