[R] Web-scraping newbie - dynamic table into R?

John Kane jrkr|de@u @end|ng |rom gm@||@com
Mon Apr 20 13:26:43 CEST 2020


Hi Julio,

I am just working on my first cup of tea of the morning so I am not
functioning all that well but I finally noticed that we have dropped the
R-help list.  I have put it back as a recipient as there are a lot of
people that know about 99%+ more than I do about the topic.

I'll keep poking around and see what I can find.

On Sun, 19 Apr 2020 at 22:34, Julio Farach <jfarach using gmail.com> wrote:

> John,
>
> I again thank you for the reply and continued support.  After a few hours,
> I arrived at the point you describe below; namely extracting elements, but
> from a different tab than the Last 10 Draws, or Winning Numbers tab.
>
> On the website, there are 5 tabs.  The elements you describe below are
> from the 3rd tab, "Odds & Prizes."  Instead of results, that tab describes
> the general odds of the Keno game.  But, I'm seeking the last 10 draws
> shown on the "Winning Numbers," or 4th tab.  I've played around with a CSS
> Selector tool, but I'm unable to extract any details (e.g., a draw number
> or Keno number) from the 4th tab.  I could extract elements of other tabs,
> like you did below, from the 3rd tab.
>
> Please let me know if you learn more or if you have other ideas for me to
> consider.
>
> Regards,
> Julio
>
> On Sun, Apr 19, 2020 at 7:00 PM John Kane <jrkrideau using gmail.com> wrote:
>
>> I am a comple newbie too but try this
>> library(rvest)
>>    Kenopage <- "
>> https://www.galottery.com/en-us/games/draw-games/keno.html#tab-winningNumbers
>> "
>>
>> Keno <- read_html(Kenopage)
>>
>> tt  <-  html_table(Keno, fill= TRUE)
>>
>> This should give you a list with 10 elements, each of which should be a
>> data.frame
>> Example
>>
>> ken1  <-  tt[[1]]
>> str(ken1)
>>
>> > str(ken1)
>> 'data.frame': 12 obs. of  4 variables:
>>  $ Numbers Matched         : chr  "10" "9" "8" "7" ...
>>  $ Base Keno! Prize        : chr  "$100,000*" "$5,000" "$500" "$50" ...
>>  $ + Bulls-Eye Prize       : chr  "$200,000*" "$20,000" "$1,500" "$100"
>> ...
>>  $ Keno! w/ Bulls-Eye Prize: chr  "$300,000" "$25,000" "$2,000" "$150" ...
>> >
>>
>> I figured this out a little a few ago and just manually stepped through
>> the data.frames to get what I wanted. Brute force and stupidity but it
>> worked
>>
>> Someday I may figure out how to use things like SelectorGadget!
>>
>>
>>
>>
>> On Sun, 19 Apr 2020 at 17:46, Julio Farach <jfarach using gmail.com> wrote:
>>
>>> John - I corrected my email below for typos.
>>>
>>> On Sun, Apr 19, 2020 at 5:42 PM Julio Farach <jfarach using gmail.com> wrote:
>>>
>>>> John,
>>>>
>>>> Yes, while I can execute the line of code that I provided, I am still
>>>> unable to capture the table shown in the browser.  The last 10 draws are
>>>> shown in a table if you view the page:
>>>>
>>>> https://www.galottery.com/en-us/games/draw-games/keno.html#tab-winningNumbers
>>>>
>>>>
>>>> But, despite using CSS and XPath combinations of
>>>> >html_nodes(x, CSS or XPath)
>>>> I am unable to copy that table into R.
>>>>
>>>> One commenter on another forum received an error and suggested that
>>>> perhaps bots lack permission to access the page.  But, I've used the
>>>> Robotstxt package to ensure that bots are indeed permitted.
>>>>
>>>> Any thoughts?
>>>>
>>>> Regards,
>>>> Julio
>>>>
>>>> On Sun, Apr 19, 2020 at 4:38 PM John Kane <jrkrideau using gmail.com> wrote:
>>>>
>>>>> Keno <- read_html(Kenopage) ?
>>>>>
>>>>> Or Am I misunderstanding the problem?
>>>>>
>>>>> On Sun, 19 Apr 2020 at 15:10, Julio Farach <jfarach using gmail.com> wrote:
>>>>>
>>>>>> How do I scrape the last 10 Keno draws from the Georgia lottery into
>>>>>> R?
>>>>>>
>>>>>>
>>>>>> I'm trying to pull the last 10 draws of a Keno lottery game into R.
>>>>>> I've
>>>>>> read several tutorials on how to scrape websites using the rvest
>>>>>> package,
>>>>>> Chrome's Inspect Element, and CSS or XPath, but I'm likely stuck
>>>>>> because
>>>>>> the table I seek is dynamically generated using Javascript.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I started with:
>>>>>>
>>>>>> >        install.packages("rvest")
>>>>>>
>>>>>> >   library(rvest)
>>>>>>
>>>>>> >        Kenopage <- "
>>>>>>
>>>>>> https://www.galottery.com/en-us/games/draw-games/keno.html#tab-winningNumbers
>>>>>> "
>>>>>>
>>>>>> > Keno <- Read.hmtl(Kenopage)
>>>>>>
>>>>>> From there, I've been unable to progress, despite hours spend on
>>>>>> combinations of CSS and XPath calls with "html_notes."
>>>>>>
>>>>>> Failed example: DrawNumber <- Keno %>% rvest::html_nodes("body") %>%
>>>>>> xml2::xml_find_all("//span[contains(@class,'Draw Number')]") %>%
>>>>>> rvest::html_text()
>>>>>>
>>>>>>
>>>>>>
>>>>>> Someone mentioned using the V8 package in R, but it's new to me.
>>>>>>
>>>>>> How do I get started?
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Julio Farach
>>>>>> https://www.linkedin.com/in/farach
>>>>>> cell phone:  804/363-2161
>>>>>> email:  JFarach using gmail.com
>>>>>>
>>>>>>         [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> John Kane
>>>>> Kingston ON Canada
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Julio Farach
>>>> https://www.linkedin.com/in/farach
>>>> cell phone:  804/363-2161
>>>> email:  JFarach using gmail.com
>>>>
>>>>
>>>
>>> --
>>>
>>> Julio Farach
>>> https://www.linkedin.com/in/farach
>>> cell phone:  804/363-2161
>>> email:  JFarach using gmail.com
>>>
>>>
>>
>> --
>> John Kane
>> Kingston ON Canada
>>
>
>
> --
>
> Julio Farach
> https://www.linkedin.com/in/farach
> cell phone:  804/363-2161
> email:  JFarach using gmail.com
>
>

-- 
John Kane
Kingston ON Canada

	[[alternative HTML version deleted]]



More information about the R-help mailing list