[R] postForm() in RCurl and library RHTMLForms
Duncan Temple Lang
duncan at wald.ucdavis.edu
Fri Nov 5 13:32:48 CET 2010
On 11/4/10 11:31 PM, sayan dasgupta wrote:
> Thanks a lot thats exactly what I was looking for
>
> Just a quick question I agree the form gets submitted to the URL
> "http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp"
>
> and I am filling up the form in the page
> "http://www.nseindia.com/content/indices/ind_histvalues.htm"
>
> How do I submit the arguments like FromDate, ToDate, Symbol using postForm()
> and submit the query to get the similar table.
>
Well that is what the function that RHTMLForms creates does.
So you can look at that code and see that it calls formQuery()
which ends in a call to postForm(). You could use
debug(postForm)
and examine the arguments to it.
postForm("...jsp", FromDate = "10-"
The answer is
o = postForm("http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp",
FromDate = "01-11-2010", ToDate = "04-11-2010",
IndexType = "S&P CNX NIFTY", check = "new",
style = "POST" )
>
>
>
>
>
>
> On Fri, Nov 5, 2010 at 6:43 AM, Duncan Temple Lang
> <duncan at wald.ucdavis.edu>wrote:
>
>>
>>
>> On 11/4/10 2:39 AM, sayan dasgupta wrote:
>>> Hi RUsers,
>>>
>>> Suppose I want to see the data on the website
>>> url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
>>>
>>> for the index "S&P CNX NIFTY" for
>>> dates "FromDate"="01-11-2010","ToDate"="02-11-2010"
>>>
>>> then read the html table from the page using readHTMLtable()
>>>
>>> I am using this code
>>> webpage <- postForm(url,.params=list(
>>> "FromDate"="01-11-2010",
>>> "ToDate"="02-11-2010",
>>> "IndexType"="S&P CNX NIFTY",
>>> "Indicesdata"="Get Details"),
>>> .opts=list(useragent = getOption("HTTPUserAgent")))
>>>
>>> But it doesn't give me desired result
>>
>> You need to be more specific about how it fails to give the desired result.
>>
>> You are in fact posting to the wrong URL. The form is submitted to a
>> different
>> URL -
>> http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp
>>
>>
>>
>>>
>>> Also I was trying to use the function getHTMLFormDescription from the
>>> package RHTMLForms but there we can't use the argument
>>> .opts=list(useragent = getOption("HTTPUserAgent")) which is needed for
>> this
>>> particular website
>>
>> That's not the case. The function RHTMLForms will generate for you does
>> support
>> the .opts parameter.
>>
>> What you want is something along the lines:
>>
>>
>> # Set default options for RCurl
>> # requests
>> options(RCurlOptions = list(useragent = "R"))
>> library(RCurl)
>>
>> # Read the HTML page since we cannot use htmlParse() directly
>> # as it does not specify the user agent or an
>> # Accept:*.*
>>
>> url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
>> wp = getURLContent(url)
>>
>> # Now that we have the page, parse it and use the RHTMLForms
>> # package to create an R function that will act as an interface
>> # to the form.
>> library(RHTMLForms)
>> library(XML)
>> doc = htmlParse(wp, asText = TRUE)
>> # need to set the URL for this document since we read it from
>> # text, rather than from the URL directly
>>
>> docName(doc) = url
>>
>> # Create the form description and generate the R
>> # function "call" the
>>
>> form = getHTMLFormDescription(doc)[[1]]
>> fun = createFunction(form)
>>
>>
>> # now we can invoke the form from R. We only need 2
>> # inputs - FromDate and ToDate
>>
>> o = fun(FromDate = "01-11-2010", ToDate = "04-11-2010")
>>
>> # Having looked at the tables, I think we want the the 3rd
>> # one.
>> table = readHTMLTable(htmlParse(o, asText = TRUE),
>> which = 3,
>> header = TRUE,
>> stringsAsFactors = FALSE)
>> table
>>
>>
>>
>>
>> Yes it is marginally involved. But that is because we cannot simply read
>> the HTML document directly from htmlParse() because the lack of Accept(&
>> useragent)
>> HTTP header.
>>
>>>
>>>
>>> Thanks and Regards
>>> Sayan Dasgupta
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list