[R] combining data.frames with is.na & match (), two questions

Eric Berger er|cjberger @end|ng |rom gm@||@com
Thu Apr 18 10:53:07 CEST 2019


Hi Drake,
Petr's suggestion to use the merge() function is good.
Another (possibly overkill) approach is to use functions from the dplyr
package, which is a fantastic package to get familiar with.
For example, the last alternative that Petr suggests is an example of what
is called a "left join" (meaning, when joining structures x and y,  keep
all the x rows, even if there is no corresponding row for y).
You can do this via dplyr as follows:

dplyr::left_join( fr2, fr1, by="Fruit")

HTH,
Eric


On Thu, Apr 18, 2019 at 11:40 AM PIKAL Petr <petr.pikal using precheza.cz> wrote:

> Hi
>
> I wonder why such combination is so complicated in your text book.
>
> Having data frames fr1 and fr2
>
> > dput(fr1)
> structure(list(Fruit = structure(c(1L, 3L, 2L), .Label = c("banana",
> "mango", "pear"), class = "factor"), Calories = c(100L, 100L,
> 200L)), class = "data.frame", row.names = c("1", "2", "3"))
> > dput(fr2)
> structure(list(Fruit = structure(c(1L, 2L, 5L, 4L, 3L), .Label = c("apple",
> "banana", "kiwi", "orange", "pear"), class = "factor"), Color =
> structure(c(3L,
> 4L, 1L, 2L, 1L), .Label = c("green", "orange", "red", "yellow"
> ), class = "factor"), Shape = structure(c(3L, 1L, 2L, 3L, 3L), .Label =
> c("oblong",
> "pear", "round"), class = "factor"), Juice = c(1, 0, 0.5, 1,
> 0)), class = "data.frame", row.names = c("1", "2", "3", "4",
> "5"))
> >
>
> > fr1
>    Fruit Calories
> 1 banana      100
> 2   pear      100
> 3  mango      200
> >
>
> you can use merge to combine those 2 data frames to get either all values
> from both
>
> > merge(fr2, fr1, all=T)
>    Fruit  Color  Shape Juice Calories
> 1  apple    red  round   1.0       NA
> 2 banana yellow oblong   0.0      100
> 3   kiwi  green  round   0.0       NA
> 4 orange orange  round   1.0       NA
> 5   pear  green   pear   0.5      100
> 6  mango   <NA>   <NA>    NA      200
>
> just values from data frame with calories
>
> > merge(fr2, fr1, all.y=T)
>    Fruit  Color  Shape Juice Calories
> 1 banana yellow oblong   0.0      100
> 2   pear  green   pear   0.5      100
> 3  mango   <NA>   <NA>    NA      200
>
> or just values from data frame with colours
>
> > merge(fr2, fr1, all.x=T)
>    Fruit  Color  Shape Juice Calories
> 1  apple    red  round   1.0       NA
> 2 banana yellow oblong   0.0      100
> 3   kiwi  green  round   0.0       NA
> 4 orange orange  round   1.0       NA
> 5   pear  green   pear   0.5      100
>
> Cheers
> Petr
>
>
> > -----Original Message-----
> > From: R-help <r-help-bounces using r-project.org> On Behalf Of Drake Gossi
> > Sent: Thursday, April 18, 2019 1:24 AM
> > To: r-help using r-project.org
> > Subject: [R] combining data.frames with is.na & match (), two questions
> >
> > Hello everyone,
> >
> > I'm working through this book, *Humanities Data in R* (Arnold & Tilton),
> and
> > I'm just having trouble understanding this maneuver.
> >
> > In sum, I'm trying to combine data in two different data.frames.
> >
> > This data.frame is called fruitNutr
> >
> > Fruit  Calories
> > 1 banana 100
> > 2 pear 100
> > 3 mango 200
> >
> > And this data.frame is called fruitData
> >
> > Fruit Color Shape Juice
> > 1 apple red round 1
> > 2 banana yellow oblong 0
> > 3 pear green pear 0.5
> > 4 orange orange round 1
> > 5 kiwi green round 0
> >
> > So, as you can see, these two data.frames overlap insofar as they both
> have
> > banana and pear. So, what happens next is the book suggests this:
> >
> > fruitData$calories <- NA
> >
> >
> > As a result, I've created a new column for the fruitData data.frame:
> >
> > Fruit Color Shape Juice Calories
> > 1 apple red round 1            N/A
> > 2 banana yellow oblong 0            N/A
> > 3 pear green pear 0.5            N/A
> > 4 orange orange round 1            N/A
> > 5 kiwi green round 0            N/A
> >
> > Then:
> >
> > > index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit) index
> >   [1]    NA       1       2      NA      NA
> > > is.na(index)
> >   [1]    TRUE   FALSE    FALSE   TRUE    TRUE
> > > fruitData$Calories [!is.na(index)] <- fruitNutr$Calories[index[!is.na
> > (index)]]
> > > fruitData
> >
> > Fruit Color Shape Juice Calories
> > 1 apple red round 1            N/A
> > 2 banana yellow oblong 0 100
> > 3 pear green pear 0.5 100
> > 4 orange orange round 1            N/A
> > 5 kiwi green round 0            N/A
> >
> > I get what the first part means, that first part being this:
> > fruitData$Calories [!is.na(index)]
> > go into the fruitData data.frame, specifically into the calories column,
> and only
> > for what's true according to is.na(index). But I just literally can't
> understand
> > this last part.  fruitNutr$Calories[index[!is.na(index)]]
> >
> > Two questions.
> >
> >
> >    1. I just literally don't understand how this code works. It does
> work,
> >    of course, but I don't know what it's doing, specifically this
> [index[!
> >    is.na(index)]] part. Could someone explain it to me like I'm five?
> I'm
> >    new at this...
> >    2. And then: is there any other way to combine these two data.frames
> so
> >    that we get this same result? maybe an easier to understand method?
> >
> > That same result, again, is
> >
> > Fruit Color Shape Juice Calories
> > 1 apple red round 1            N/A
> > 2 banana yellow oblong 0 100
> > 3 pear green pear 0.5 100
> > 4 orange orange round 1            N/A
> > 5 kiwi green round 0            N/A
> >
> >
> > Drake
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních
> partnerů PRECHEZA a.s. jsou zveřejněny na:
> https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information
> about processing and protection of business partner’s personal data are
> available on website:
> https://www.precheza.cz/en/personal-data-protection-principles/
> Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou
> důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení
> odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any
> documents attached to it may be confidential and are subject to the legally
> binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list