[Rd] extracting tables from web pages?

Dirk Eddelbuettel edd at debian.org
Thu Apr 25 22:19:45 CEST 2013


On 25 April 2013 at 13:00, Spencer Graves wrote:
| Hello:
| 
| 
|        What tools would you recommend for extracting the table of 
| members of the US House of representatives from 
| "http://house.gov/representatives/" and 
| "http://en.wikipedia.org/wiki/List_of_current_members_of_the_United_States_House_of_Representatives_by_age"? 
| 
| 
| 
|        I started writing something using getURL{RCurl}.  However, I'm 
| getting bogged down manually selecting character sequences to search for 
| and split on.

You could try your own sos package to search what others have done here; the
XML package is popular for it but the whole scheme is fraught with little
pitfalls as html very definitely is not a good format for data-delivery, and
an html page clearly is no API for data access.

Dirk

-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com



More information about the R-devel mailing list