[R] Navigating web pages using R
marchywka at hotmail.com
Tue Jan 4 21:10:55 CET 2011
> Date: Tue, 4 Jan 2011 10:54:19 -0800
> From: egregory2007 at yahoo.com
> To: r-help at r-project.org
> Subject: [R] Navigating web pages using R
> I'm trying to obtain some data from a webpage which masks the URL from the user,
> so an explicit URL will not work. For example, when one navigates to the web
> page the URL looks something like:
> http://22.214.171.124/rpt34s.php?flags=1 (changed for privacy, but i'm not sure
> you could access it anyways since it's internal to the agency I work for).
LOL, presuming you are not a disgruntled employee, it is always amusing to
see some entity with a fancy cryptic web design drink their own Koolaid :)
This is the most annoying kind of code to write, especially when there is
no reason such as revenue model to make it hard to get. I've posted in other
forums about the general need for an API if you are providing data to others
in a non-hostile setting.
> The site has three drop-down menus for "Site", "Month," and "Year". When a
> combination is selected of these, the resulting URL is
> always http://126.96.36.199/rpt34s (nothing changes, except "flags=1" is
> dropped, so what I need to be able to do is write something that will navigate
> to the original URL, then select some combination of "Site", "Month", and
> "Year," and then submit the query to the site to navigate to the page with the
> Is this a capability that R has as a language? Unfortunately, I'm unfamiliar
> with html or php programming, so if this question belongs in a forum on that I
> apologize. I'm trying to centralize all of my code for my analysis in R!
I'm sure that ultimately you can code this in R but for digging out what
you need there may be better approaches.
First I would try to contact the page author or determine if there is
a better way to get the same data. Failing that, you may be able to find
a "form" section in the html and copy that. Firefox is supposed to have something
called "firebug" to let you see what the page does but I've never actually used
that. Generally I use linux or cygwin command line tools to diagnose this junk,
R may support some of these features but this is a common issue outside of R too
and so it may be worth while learning the other tools. If all else fails, downloading
a local copy of the page etc, you may be able to do a packet capture and just
see what it does by brute force.
>From what I have seen, the R tools are pretty much named after the linux tools,
curl for example.
> Thank you,
> -Erik Gregory
> Student Assistant, California EPA
> CSU Sacramento, Mathematics
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help