[R] Extract Data form Website Tables

Jennifer Young jennifer.a.m.young at gmail.com
Sat Mar 22 18:17:41 CET 2014


Hi Doran

I'm also trying to scrape the leaderboard data. Did you happen to figure 
out how to extract the athlete's team/affiliate? Trying to do a bit of code 
to figure out which teams will qualify when individuals are removed.

On Sunday, March 2, 2014 2:34:21 PM UTC-5, Doran, Harold wrote:
>
> This is fantastic, thank you. I¹ve modified the code to loop through all 
> the pages and grab all rows of the HTML table. 
>
> Thank you, Rui. 
>
>
>
> On 3/2/14, 5:08 AM, "Rui Barradas" <ruipba... at sapo.pt <javascript:>> 
> wrote: 
>
> >Hello, 
> > 
> >Maybe something like the following. 
> > 
> >#install.packages("XML", dep = TRUE) 
> > 
> >library(XML) 
> > 
> >url <- 
> >"
> http://games.crossfit.com/scores/leaderboard.php?stage=1&sort=0&division= 
> >1&region=0&numberperpage=60&page=0&competition=0&frontpage=0&expanded=0&fu 
>
> >ll=1&year=14&showtoggles=0&hidedropdowns=0&showathleteac=1&athletename=" 
> >data <- readHTMLTable(readLines(url), which=1, header=TRUE) 
> > 
> >names(data) <- gsub("\\n", "", names(data)) 
> >names(data) <- gsub(" +", "", names(data)) 
> > 
> >data[] <- lapply(data, function(x) gsub("\\n", "", x)) 
> > 
> >str(data) 
> > 
> > 
> >Hope this helps, 
> > 
> >Rui Barradas 
> > 
> >Em 01-03-2014 23:47, Doran, Harold escreveu: 
> >> There is a website that populates a table with athlete scores during a 
> >>competition. I would like to be able to extract those scores from the 
> >>website and place them into a data frame if this is possible. The 
> >>website is at the link below: 
> >> 
> >> http://games.crossfit.com/leaderboard 
> >> 
> >> One complication is that one must manually click through multiple pages 
> >>as the table only populates a few hundred rows on one web page. In 
> >>looking at the source code of the website, I think I can go to here and 
> >>maybe grab scores, but I am not sure if R can someone read them in from 
> >>this and populate a data frame and subsequently grab data from every 
> >>page. 
> >> 
> >> 
> >>
> http://games.crossfit.com/scores/leaderboard.php?loadfromcookies=1&number 
> >>perpage=60&full=1&showathleteac=1<view-source:
> http://games.crossfit.com/s 
> >>cores/leaderboard.php?loadfromcookies=1&numberperpage=60&full=1&showathle 
>
> >>teac=1> 
> >> 
> >> I have not done anything like this before, and so any guidance is 
> >>appreciated. 
> >> 
> >> 
> >> Thank you 
> >> 
> >> Harold 
> >> 
> >> 
> >> 
> >>         [[alternative HTML version deleted]] 
> >> 
> >> ______________________________________________ 
> >> R-h... at r-project.org <javascript:> mailing list 
> >> https://stat.ethz.ch/mailman/listinfo/r-help 
> >> PLEASE do read the posting guide 
> >>http://www.R-project.org/posting-guide.html 
> >> and provide commented, minimal, self-contained, reproducible code. 
> >> 
>
> ______________________________________________ 
> R-h... at r-project.org <javascript:> mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code. 
>


More information about the R-help mailing list