[R] Lahman Baseball Data Using R DBI Package

William Michels wjm1 @end|ng |rom c@@@co|umb|@@edu
Thu Oct 8 06:56:59 CEST 2020


Hi Philip,

You've probably realized by now that R doesn't like column names that
start with a number. If you try to access an R-dataframe column named
2B or 3B with the familiar "$" notation, you'll get an error:

> library(DBI)
> library(RSQLite)
> con2 <- dbConnect(SQLite(), "~/R_Dir/lahmansbaseballdb.sqlite")
> Hack12Batting <- dbGetQuery(con2,"SELECT * FROM batting WHERE yearID = 2018 AND AB >600 ORDER BY AB DESC")
> Hack12Batting$AB
 [1] 664 661 639 632 632 632 626 623 620 618 617 613 606 605 602
> Hack12Batting$3B
Error: unexpected numeric constant in "Hack12Batting$3"

How to handle? You can rename columns on-the-fly by piping. See
reference [1] and use either library(magrittr) or library(dplyr) or a
combination thereof:

library(magrittr)
dbGetQuery(con2,"SELECT * FROM batting WHERE yearID = 2018 AND AB >600
ORDER BY AB DESC") %>% set_colnames(make.names(colnames(.)))

#OR one of the following:

library(dplyr)
dbGetQuery(con2,"SELECT * FROM batting WHERE yearID = 2018 AND AB >600
ORDER BY AB DESC") %>% rename(X2B = `2B`, X3B = `3B`)

library(dplyr)
dbGetQuery(con2,"SELECT * FROM batting WHERE yearID = 2018 AND AB >600
ORDER BY AB DESC") %>% `colnames<-`(make.names(colnames(.)))

library(dplyr)
dbGetQuery(con2,"SELECT * FROM batting WHERE yearID = 2018 AND AB >600
ORDER BY AB DESC") %>% magrittr::set_colnames(make.names(colnames(.)))

Best, Bill.

W. Michels, Ph.D.

[1] https://stackoverflow.com/questions/28100780/use-with-replacement-functions-like-colnames










On Fri, Oct 2, 2020 at 7:34 PM Bill Dunlap <williamwdunlap using gmail.com> wrote:
>
> The double quotes are required by SQL if a name is not of the form
> letter-followed-by-any-number-of-letters-or-numbers or if the name is a SQL
> keyword like 'where' or 'select'.  If you are doing this from a function,
> you may as well quote all the names.
>
> -Bill
>
> On Fri, Oct 2, 2020 at 6:18 PM Philip <herd_dog using cox.net> wrote:
>
> > The \”2B\” worked.  Have no idea why.  Can you point me somewhere that can
> > explain this to me.
> >
> > Thanks,
> > Philip
> >
> > *From:* Bill Dunlap
> > *Sent:* Friday, October 2, 2020 3:54 PM
> > *To:* Philip
> > *Cc:* r-help
> > *Subject:* Re: [R] Lahman Baseball Data Using R DBI Package
> >
> > Have you tried putting double quotes around 2B and 3B:  "...2B, 3B, ..."
> > -> "...\"2B\",\"3B\",..."?
> >
> > -Bill
> >
> > On Fri, Oct 2, 2020 at 3:49 PM Philip <herd_dog using cox.net> wrote:
> >
> >> I’m trying to pull data from one table (batting) in the Lahman Baseball
> >> database.  Notice X2B for doubles and X3B for triples – fourth and fifth
> >> from the right.
> >>
> >> The dbGetQuery function runs fine when I leave there two out but I get
> >> error messages (in red) when I include 2B/3B or X2B/X3B.
> >>
> >> Can anyone give me some direction?
> >>
> >> Thanks,
> >> Philip Heinrich
> >>
> >> ***************************************************************************************************************************************************
> >> tail(dbReadTable(Lahman,"batting"))
> >>
> >> ID                       playerID      yearID    stint teamID team_ID
> >> lgID   G        G_batting   AB     R     H       X2B    X3B   HR   RBI   SB
> >> 107414 107414 yastrmi01      2019       1       SFN       2920
> >> NL     107        NA          371   64  101      22       3     21
> >> 55    2
> >> 107416 107416 yelicch01      2019        1       MIL       2911
> >> NL     130        NA          489 100  161      29       3     44    97   30
> >> 107419 107419 youngal01     2019       1       ARI        2896
> >> NL       17        NA            25     1      1        0       0
> >> 0      0     0
> >> 107420 107420 zagunma01   2019        1      CHN       2901      NL
> >> 30        NA             36     2      9        3       0      0
> >> 5     0
> >> 107422 107422 zavalse01      2019        1      CHA       2900
> >> AL        5        NA             12     1      1        0       0
> >> 0       0     0
> >> 107427 107427 zimmery01     2019        1      WAS      2925      NL
> >> 52        NA           171   20    44        9      0      6      27     0
> >> 107428 107428 zobribe01       2019        1      CHN      2901
> >> NL      47        NA           150   24   39         5      0      1
> >> 17     0
> >> 107429 107429 zuninmi01       2019        1      TBA       2922
> >> AL      90        NA           266    30  44       10      1      9
> >> 32     0
> >>
> >>
> >> Hack11Batting <- dbGetQuery(Lahman,"SELECT
> >> playerID,yearID,AB,R,H,2B,3B,HR,
> >>                             RBI,SB,CS,BB,SO,IBB,HBP,SH,SF,GIDP FROM
> >> batting
> >>                             WHERE yearID = 2018 AND AB >99")
> >> Error: unrecognized token: "2B"
> >>
> >> Hack11Batting <- dbGetQuery(Lahman,"SELECT
> >> playerID,yearID,AB,R,H,X2B,X3B,HR,
> >>                             RBI,SB,CS,BB,SO,IBB,HBP,SH,SF,GIDP FROM
> >> batting
> >>                             WHERE yearID = 2018 AND AB >99")
> >> Error: no such column: X2B
> >>
> >>         [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list