[R] RCurl::postForm() -- how does one determine what the names are of each form element in an online html form?

Tony Breyal tony.breyal at googlemail.com
Wed Dec 10 19:29:39 CET 2008


Thank you Felix and also to the individual who replied off-list.

re: html code -- you are both indeed correct that the form elements
are named within the html code for a simple form, and i thank you both
for letting me know about this. For simple forms i think i will try
and write myself a function which can automatically identify these
elements, probably using the XML package. It looks to me like R only
has to inspect the contents of the <form></form> tag to determine
these names.

re: javascript and Ajax -- I think these are beyond my current skill
and was not able to investigate, but thank you for the suggestion in
this direction, it may be that i can learn about this in the future.


for anyone who does a search on this topic, the html form elements in
the html code below are called: "licenseID", "content" and
"paramsXML".

### html code example start ###
<html>
<head> <title>Calais test page</title> </head>
<body>
 <form action="http://api.opencalais.com/enlighten/rest/"
method="post" accept-charset="utf-8">
       licenseID: <input type="text" name="licenseID" />
       <input type="submit" /><br />
       content: <br />
       <textarea rows="15" cols="80" name="content" ></textarea><br />
       paramsXML:  <br />
       <textarea rows="15" cols="80" name="paramsXML" /></
textarea><br />
</form>
</body>
</html>
### html code example end ###

### something like the following would help identify the form element
names i think
library(XML)
#src.file <- [location of the html code above]
html <- htmlTreeParse(src.file, useInternal=TRUE, error=function(...)
{})
xpathApply(html, "//body//form//text()", xmlValue)
[[1]]
[1] "\r\n\tlicenseID: "

[[2]]
[1] "\r\n\tcontent: "

[[3]]
[1] "\r\n\tparamsXML:  "


Cheers,
Tony Breyal


On 10 Dec, 03:06, "Felix Andrews" <fe... at nfrac.org> wrote:
> 2008/12/10 Tony Breyal <tony.bre... at googlemail.com>:
>
>
>
> > Dear R-Help,
>
> > I am looking into using the Open Calais web service (http://
> > sws.clearforest.com/calaisViewer/) for text mining purposes. I would
> > like to use R to post text into one of the forms on their website.
>
> > In package RCurl, there is a function called postForm(). This sounds
> > like it would do the job. Unfortunately the URL used in the example is
> > no longer valid (i have emailed the maintainer about this).
>
> > Question: How does one determine the name of the form elements to use?
> > is there an R function which will print out the names of these
> > elements perhaps?
>
> > [i am still learning, so please forgive me if i used the wrong
> > terminology.]
>
> > ### Example from ?postForm ###
> > library(RCurl)
> > # Now looking at POST method for forms.
> > postForm("http://www.speakeasy.org/~cgires/perl_form.cgi",
> >            "some_text" = "Duncan",
> >            "choice" = "Ho",
> >            "radbut" = "eep",
> >            "box" = "box1, box2"
> >          )
> > ### Example ends ###
>
> > So in the above code, i believe the form elements are: "some_text",
> > "choice", "redbut" and "box". But how does one find out the names of
> > these form elements if one is not given them previously?
>
> You need to look at the HTML source of the web page to work out what
> the form elements are called. However, in your case, it is not a
> simple form; rather the form is handled by a javascript function which
> uses Ajax to handle the request. Imitating the javascript, you may be
> able to post the data tohttp://sws.clearforest.com/calaisViewer//Bridge.asmx/BridgeMe
> with an element 'content' containing the (url-encoded) text, and an
> element 'type' = 'text/txt'.
>
> If that works it would return the result in an XML block.
>
>
>
> > I hope that the above made sense, and thank you kindly in advance for
> > any help.
> > Tony Breyal.
>
> > ______________________________________________
> > R-h... at r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Felix Andrews / 安福立http://www.neurofractal.org/felix/
> 3358 543D AAC6 22C2 D336  80D9 360B 72DD 3E4C F5D8
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list