[R] Extract Data form Website Tables

Doran, Harold HDoran at air.org
Sun Mar 2 20:34:21 CET 2014


This is fantastic, thank you. I¹ve modified the code to loop through all
the pages and grab all rows of the HTML table.

Thank you, Rui.



On 3/2/14, 5:08 AM, "Rui Barradas" <ruipbarradas at sapo.pt> wrote:

>Hello,
>
>Maybe something like the following.
>
>#install.packages("XML", dep = TRUE)
>
>library(XML)
>
>url <- 
>"http://games.crossfit.com/scores/leaderboard.php?stage=1&sort=0&division=
>1&region=0&numberperpage=60&page=0&competition=0&frontpage=0&expanded=0&fu
>ll=1&year=14&showtoggles=0&hidedropdowns=0&showathleteac=1&athletename="
>data <- readHTMLTable(readLines(url), which=1, header=TRUE)
>
>names(data) <- gsub("\\n", "", names(data))
>names(data) <- gsub(" +", "", names(data))
>
>data[] <- lapply(data, function(x) gsub("\\n", "", x))
>
>str(data)
>
>
>Hope this helps,
>
>Rui Barradas
>
>Em 01-03-2014 23:47, Doran, Harold escreveu:
>> There is a website that populates a table with athlete scores during a
>>competition. I would like to be able to extract those scores from the
>>website and place them into a data frame if this is possible. The
>>website is at the link below:
>>
>> http://games.crossfit.com/leaderboard
>>
>> One complication is that one must manually click through multiple pages
>>as the table only populates a few hundred rows on one web page. In
>>looking at the source code of the website, I think I can go to here and
>>maybe grab scores, but I am not sure if R can someone read them in from
>>this and populate a data frame and subsequently grab data from every
>>page.
>>
>> 
>>http://games.crossfit.com/scores/leaderboard.php?loadfromcookies=1&number
>>perpage=60&full=1&showathleteac=1<view-source:http://games.crossfit.com/s
>>cores/leaderboard.php?loadfromcookies=1&numberperpage=60&full=1&showathle
>>teac=1>
>>
>> I have not done anything like this before, and so any guidance is
>>appreciated.
>>
>>
>> Thank you
>>
>> Harold
>>
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>




More information about the R-help mailing list