[R] Downloading data from from internet
Duncan Temple Lang
duncan at wald.ucdavis.edu
Thu Sep 24 17:56:06 CEST 2009
Thanks for explaining this, Charlie.
Just for completeness and to make things a little easier,
the XML package has a function named readHTMLTable()
and you can call it with a URL and it will attempt
to read all the tables in the page.
tbls = readHTMLTable('http://www.rateinflation.com/consumer-price-index/usa-cpi.php')
yields a list with 10 elements, and the table of interest with the data is the 10th one.
tbls[[10]]
The function does the XPath voodoo and sapply() work for you and uses some heuristics.
There are various controls one can specify and also various methods for working
with sub-parts of the HTML document directly.
D.
cls59 wrote:
>
>
> Bogaso wrote:
>> Hi all,
>>
>> I want to download data from those two different sources, directly into R
>> :
>>
>> http://www.rateinflation.com/consumer-price-index/usa-cpi.php
>> http://eaindustry.nic.in/asp2/list_d.asp
>>
>> First one is CPI of US and 2nd one is WPI of India. Can anyone please give
>> any clue how to download them directly into R. I want to make them zoo
>> object for further analysis.
>>
>> Thanks,
>>
>
> The following site did not load for me:
>
> http://eaindustry.nic.in/asp2/list_d.asp
>
> But I was able to extract the table from the US CPI site using Duncan Temple
> Lang's XML package:
>
> library(XML)
>
>
> First, download the website into R:
>
> html.raw <- readLines(
> 'http://www.rateinflation.com/consumer-price-index/usa-cpi.php' )
>
> Then, convert to an HTML object using the XML package:
>
> html.data <- htmlTreeParse( html.raw, asText = T, useInternalNodes = T )
>
> A quick scan of the page source in the browser reveals that the table you
> want is encased in a div with a class of "dynamicContent"-- we will use a
> xpath specification[1] to retrieve all rows in that table:
>
> table.html <- getNodeSet( html.data,
> '//div[@class="dynamicContent"]/table/tr' )
>
> Now, the data values can be extracted from the cells in the rows using a
> little sapply and xpathXpply voodoo:
>
> table.data <- t( sapply( table.html, function( row ){
>
> row.data <- xpathSApply( row, './td', xmlValue )
> return( row.data)
>
> }))
>
>
> Good luck!
>
> -Charlie
>
> [1]: http://www.w3schools.com/XPath/xpath_syntax.asp
>
> -----
> Charlie Sharpsteen
> Undergraduate
> Environmental Resources Engineering
> Humboldt State University
More information about the R-help
mailing list