[R] Downloading data from from internet

Bogaso bogaso.christofer at gmail.com
Fri Sep 25 13:07:32 CEST 2009


Thank you so much for those helps. However I need little more help. In the
site
"http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php"
if I scroll below then there is an option "Historical CPI Index For USA"
Next if I click on "Get Data" then another table pops-up, however without
any significant change in address bar. This tables holds more data starting
from 1999. Can you please help me how to get the values of this table?

Thanks


Duncan Temple Lang wrote:
> 
> 
> Thanks for explaining this, Charlie.
> 
> Just for completeness and to make things a little easier,
> the XML package has a function named readHTMLTable()
> and you can call it with a URL and it will attempt
> to read all the tables in the page.
> 
>  tbls =
> readHTMLTable('http://www.rateinflation.com/consumer-price-index/usa-cpi.php')
> 
> yields a list with 10 elements, and the table of interest with the data is
> the 10th one.
> 
>  tbls[[10]]
> 
> The function does the XPath voodoo and sapply() work for you and uses some
> heuristics.
> There are various controls one can specify and also various methods for
> working
> with sub-parts of the HTML document directly.
> 
>   D.
> 
> 
> 
> cls59 wrote:
>> 
>> 
>> Bogaso wrote:
>>> Hi all,
>>>
>>> I want to download data from those two different sources, directly into
>>> R
>>> :
>>>
>>> http://www.rateinflation.com/consumer-price-index/usa-cpi.php
>>> http://eaindustry.nic.in/asp2/list_d.asp
>>>
>>> First one is CPI of US and 2nd one is WPI of India. Can anyone please
>>> give
>>> any clue how to download them directly into R. I want to make them zoo
>>> object for further analysis.
>>>
>>> Thanks,
>>>
>> 
>> The following site did not load for me:
>> 
>> http://eaindustry.nic.in/asp2/list_d.asp
>> 
>> But I was able to extract the table from the US CPI site using Duncan
>> Temple
>> Lang's XML package:
>> 
>>   library(XML)
>> 
>> 
>> First, download the website into R:
>> 
>>   html.raw <- readLines(
>> 'http://www.rateinflation.com/consumer-price-index/usa-cpi.php' )
>> 
>> Then, convert to an HTML object using the XML package:
>> 
>>   html.data <- htmlTreeParse( html.raw, asText = T, useInternalNodes = T
>> )
>> 
>> A quick scan of the page source in the browser reveals that the table you
>> want is encased in a div with a class of "dynamicContent"-- we will use a
>> xpath specification[1] to retrieve all rows in that table:
>> 
>>   table.html <- getNodeSet( html.data,
>> '//div[@class="dynamicContent"]/table/tr' )
>> 
>> Now, the data values can be extracted from the cells in the rows using a
>> little sapply and xpathXpply voodoo:
>> 
>>   table.data <- t( sapply( table.html, function( row ){
>> 
>>     row.data <-  xpathSApply( row, './td', xmlValue )
>>     return( row.data)
>> 
>>   }))
>> 
>> 
>> Good luck!
>> 
>> -Charlie
>>  
>>   [1]:  http://www.w3schools.com/XPath/xpath_syntax.asp
>> 
>> -----
>> Charlie Sharpsteen
>> Undergraduate
>> Environmental Resources Engineering
>> Humboldt State University
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/Downloading-data-from-from-internet-tp25568930p25610171.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list