[R] Lahman Baseball Data Using R DBI Package
William Michels
wjm1 @end|ng |rom c@@@co|umb|@@edu
Thu Oct 8 06:56:59 CEST 2020
Hi Philip,
You've probably realized by now that R doesn't like column names that
start with a number. If you try to access an R-dataframe column named
2B or 3B with the familiar "$" notation, you'll get an error:
> library(DBI)
> library(RSQLite)
> con2 <- dbConnect(SQLite(), "~/R_Dir/lahmansbaseballdb.sqlite")
> Hack12Batting <- dbGetQuery(con2,"SELECT * FROM batting WHERE yearID = 2018 AND AB >600 ORDER BY AB DESC")
> Hack12Batting$AB
[1] 664 661 639 632 632 632 626 623 620 618 617 613 606 605 602
> Hack12Batting$3B
Error: unexpected numeric constant in "Hack12Batting$3"
How to handle? You can rename columns on-the-fly by piping. See
reference [1] and use either library(magrittr) or library(dplyr) or a
combination thereof:
library(magrittr)
dbGetQuery(con2,"SELECT * FROM batting WHERE yearID = 2018 AND AB >600
ORDER BY AB DESC") %>% set_colnames(make.names(colnames(.)))
#OR one of the following:
library(dplyr)
dbGetQuery(con2,"SELECT * FROM batting WHERE yearID = 2018 AND AB >600
ORDER BY AB DESC") %>% rename(X2B = `2B`, X3B = `3B`)
library(dplyr)
dbGetQuery(con2,"SELECT * FROM batting WHERE yearID = 2018 AND AB >600
ORDER BY AB DESC") %>% `colnames<-`(make.names(colnames(.)))
library(dplyr)
dbGetQuery(con2,"SELECT * FROM batting WHERE yearID = 2018 AND AB >600
ORDER BY AB DESC") %>% magrittr::set_colnames(make.names(colnames(.)))
Best, Bill.
W. Michels, Ph.D.
[1] https://stackoverflow.com/questions/28100780/use-with-replacement-functions-like-colnames
On Fri, Oct 2, 2020 at 7:34 PM Bill Dunlap <williamwdunlap using gmail.com> wrote:
>
> The double quotes are required by SQL if a name is not of the form
> letter-followed-by-any-number-of-letters-or-numbers or if the name is a SQL
> keyword like 'where' or 'select'. If you are doing this from a function,
> you may as well quote all the names.
>
> -Bill
>
> On Fri, Oct 2, 2020 at 6:18 PM Philip <herd_dog using cox.net> wrote:
>
> > The \”2B\” worked. Have no idea why. Can you point me somewhere that can
> > explain this to me.
> >
> > Thanks,
> > Philip
> >
> > *From:* Bill Dunlap
> > *Sent:* Friday, October 2, 2020 3:54 PM
> > *To:* Philip
> > *Cc:* r-help
> > *Subject:* Re: [R] Lahman Baseball Data Using R DBI Package
> >
> > Have you tried putting double quotes around 2B and 3B: "...2B, 3B, ..."
> > -> "...\"2B\",\"3B\",..."?
> >
> > -Bill
> >
> > On Fri, Oct 2, 2020 at 3:49 PM Philip <herd_dog using cox.net> wrote:
> >
> >> I’m trying to pull data from one table (batting) in the Lahman Baseball
> >> database. Notice X2B for doubles and X3B for triples – fourth and fifth
> >> from the right.
> >>
> >> The dbGetQuery function runs fine when I leave there two out but I get
> >> error messages (in red) when I include 2B/3B or X2B/X3B.
> >>
> >> Can anyone give me some direction?
> >>
> >> Thanks,
> >> Philip Heinrich
> >>
> >> ***************************************************************************************************************************************************
> >> tail(dbReadTable(Lahman,"batting"))
> >>
> >> ID playerID yearID stint teamID team_ID
> >> lgID G G_batting AB R H X2B X3B HR RBI SB
> >> 107414 107414 yastrmi01 2019 1 SFN 2920
> >> NL 107 NA 371 64 101 22 3 21
> >> 55 2
> >> 107416 107416 yelicch01 2019 1 MIL 2911
> >> NL 130 NA 489 100 161 29 3 44 97 30
> >> 107419 107419 youngal01 2019 1 ARI 2896
> >> NL 17 NA 25 1 1 0 0
> >> 0 0 0
> >> 107420 107420 zagunma01 2019 1 CHN 2901 NL
> >> 30 NA 36 2 9 3 0 0
> >> 5 0
> >> 107422 107422 zavalse01 2019 1 CHA 2900
> >> AL 5 NA 12 1 1 0 0
> >> 0 0 0
> >> 107427 107427 zimmery01 2019 1 WAS 2925 NL
> >> 52 NA 171 20 44 9 0 6 27 0
> >> 107428 107428 zobribe01 2019 1 CHN 2901
> >> NL 47 NA 150 24 39 5 0 1
> >> 17 0
> >> 107429 107429 zuninmi01 2019 1 TBA 2922
> >> AL 90 NA 266 30 44 10 1 9
> >> 32 0
> >>
> >>
> >> Hack11Batting <- dbGetQuery(Lahman,"SELECT
> >> playerID,yearID,AB,R,H,2B,3B,HR,
> >> RBI,SB,CS,BB,SO,IBB,HBP,SH,SF,GIDP FROM
> >> batting
> >> WHERE yearID = 2018 AND AB >99")
> >> Error: unrecognized token: "2B"
> >>
> >> Hack11Batting <- dbGetQuery(Lahman,"SELECT
> >> playerID,yearID,AB,R,H,X2B,X3B,HR,
> >> RBI,SB,CS,BB,SO,IBB,HBP,SH,SF,GIDP FROM
> >> batting
> >> WHERE yearID = 2018 AND AB >99")
> >> Error: no such column: X2B
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list