[R] Extract Data form Website Tables

Rui Barradas ruipbarradas at sapo.pt
Sun Mar 2 11:08:37 CET 2014


Hello,

Maybe something like the following.

#install.packages("XML", dep = TRUE)

library(XML)

url <- 
"http://games.crossfit.com/scores/leaderboard.php?stage=1&sort=0&division=1&region=0&numberperpage=60&page=0&competition=0&frontpage=0&expanded=0&full=1&year=14&showtoggles=0&hidedropdowns=0&showathleteac=1&athletename="
data <- readHTMLTable(readLines(url), which=1, header=TRUE)

names(data) <- gsub("\\n", "", names(data))
names(data) <- gsub(" +", "", names(data))

data[] <- lapply(data, function(x) gsub("\\n", "", x))

str(data)


Hope this helps,

Rui Barradas

Em 01-03-2014 23:47, Doran, Harold escreveu:
> There is a website that populates a table with athlete scores during a competition. I would like to be able to extract those scores from the website and place them into a data frame if this is possible. The website is at the link below:
>
> http://games.crossfit.com/leaderboard
>
> One complication is that one must manually click through multiple pages as the table only populates a few hundred rows on one web page. In looking at the source code of the website, I think I can go to here and maybe grab scores, but I am not sure if R can someone read them in from this and populate a data frame and subsequently grab data from every page.
>
> http://games.crossfit.com/scores/leaderboard.php?loadfromcookies=1&numberperpage=60&full=1&showathleteac=1<view-source:http://games.crossfit.com/scores/leaderboard.php?loadfromcookies=1&numberperpage=60&full=1&showathleteac=1>
>
> I have not done anything like this before, and so any guidance is appreciated.
>
>
> Thank you
>
> Harold
>
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list