[R] Opening or activating a URL to access data, alternative to browseURL
Duncan Murdoch
murdoch.duncan at gmail.com
Tue Oct 11 15:21:35 CEST 2016
On 11/10/2016 7:59 AM, Ryan Utz wrote:
> Bob/Duncan,
>
> Thanks for writing. I think some of the things Bob mentioned might work,
> but I'm still not quite getting there. Below is the example I'm working
> with:
>
It worked for me when I replaced the browseURL call with a readLines
call, as I suggested the other day. What went wrong for you?
Duncan Murdoch
> #1
> browseURL('http://pick18.discoverlife.org/mp/20m?plot=2&kind=Hypoprepia+fucosa&site=33.9+-83.3&date1=2011,2012,2013&flags=build_txt:
> <http://pick18.discoverlife.org/mp/20m?plot=2&kind=Hypoprepia+fucosa&site=33.9+-83.3&date1=2011,2012,2013&flags=build_txt:>')
> # This opens the URL and creates a link to machine-readable data on the
> page, which I can then download by simply doing this:
>
> #2
> read.delim('http://pick18.discoverlife.org/tmp/Hypoprepia_fucosa_33.9_-83.3_2011,2012,2013.txt
> <http://pick18.discoverlife.org/tmp/Hypoprepia_fucosa_33.9_-83.3_2011,2012,2013.txt>')
> #This is what I need to read in terms of data, but this URL only exists
> if the URL ran above is activated first
>
> So, for example, try running line #2 without the first line- it won't
> work. Next run #1 then #2- works fine.
>
> See what I mean?
>
>
> On Thu, Sep 29, 2016 at 5:09 PM, Bob Rudis <bob at rud.is
> <mailto:bob at rud.is>> wrote:
>
> The rvest/httr/curl trio can do the cookie management pretty well.
> Make the initial connection via rvest::html_session() and then
> hopefully be able to use other rvest function calls, but curl and
> httr calls will use the cached in-memory handle info seamlessly.
> You'd need to store and retrieve cookies if you need them preserved
> between R sessions.
>
> Failing the above and assuming this would not need to be lightning
> fast, use the phantomjs or firefox web driver (either with RSelenium
> or some new stuff rOpenSci is cooking up) which will then do what
> browsers do best and maintain all this state for you. You can still
> slurp the page contents up with xml2::read_html() and use the super
> handy processing idioms in the scraping tidyverse (it needs it's own
> name).
>
> A concrete example (assuming the URLs aren't sensitive) would enable
> me or someone else to mock up something for you.
>
>
> On Thu, Sep 29, 2016 at 4:59 PM, Duncan Murdoch
> <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>> wrote:
>
> On 29/09/2016 3:29 PM, Ryan Utz wrote:
>
> Hi all,
>
> I've got a situation that involves activating a URL so that
> a link to some
> data becomes available for download. I can easily use
> 'browseURL' to do so,
> but I'm hoping to make this batch-process-able, and I would
> prefer to not
> have 100s of browser windows open when I go to download
> multiple data sets.
>
> Here's the example:
>
> #1
> browseURL('
> http://pick18.discoverlife.org/mp/20m?plot=2&kind=Hypoprepia+fucosa&site=33.9+-83.3&date1=2011,2012,2013&flags=build_txt
> <http://pick18.discoverlife.org/mp/20m?plot=2&kind=Hypoprepia+fucosa&site=33.9+-83.3&date1=2011,2012,2013&flags=build_txt>:
> ')
> # This opens the URL and creates a link to machine-readable
> data on the
> page, which I can then download by simply doing this:
>
> #2
> read.delim('
> http://pick18.discoverlife.org/tmp/Hypoprepia_fucosa_33.9_-83.3_2011,2012,2013.txt
> <http://pick18.discoverlife.org/tmp/Hypoprepia_fucosa_33.9_-83.3_2011,2012,2013.txt>
> ')
>
> However, I can only get the second line above to work if the
> thing in line
> #1 has been opened in a browser already. Is there any way to
> allow me to
> either 1) close the browser after it's been opened or 2)
> execute the line
> #2 above without having to open a browser? We have hundreds
> of species that
> you can see after the '&kind=' bit of the URL, so I'm trying
> to keep the
> browsing situation sane.
>
> Thanks!
> R
>
>
> You'll need to figure out what happens when you open the first
> page. Does it set a cookie? Does it record your IP address?
> Does it just build the file but record nothing about you?
>
> If it's one of the simpler versions, you can just read the first
> page, wait a bit, then read the second one.
>
> If you need to manage cookies, you'll need something more
> complicated. I don't know the easiest way to do that.
>
> Duncan Murdoch
>
>
> ______________________________________________
> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
> -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>
> --
>
> Ryan Utz, Ph.D.
> Assistant professor of water resources
> *chatham**UNIVERSITY*
> Home/Cell: (724) 272-7769
>
More information about the R-help
mailing list