[R] Opening or activating a URL to access data, alternative to browseURL
Ryan Utz
utz.ryan at gmail.com
Wed Oct 12 23:28:39 CEST 2016
Eureka! I wish I could send a box of digital donuts. Thanks so much!!!!
On Tue, Oct 11, 2016 at 9:21 AM, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
> On 11/10/2016 7:59 AM, Ryan Utz wrote:
>
>> Bob/Duncan,
>>
>> Thanks for writing. I think some of the things Bob mentioned might work,
>> but I'm still not quite getting there. Below is the example I'm working
>> with:
>>
>>
> It worked for me when I replaced the browseURL call with a readLines call,
> as I suggested the other day. What went wrong for you?
>
> Duncan Murdoch
>
> #1
>> browseURL('http://pick18.discoverlife.org/mp/20m?plot=2&
>> kind=Hypoprepia+fucosa&site=33.9+-83.3&date1=2011,2012,2013&
>> flags=build_txt:
>> <http://pick18.discoverlife.org/mp/20m?plot=2&kind=Hypoprepi
>> a+fucosa&site=33.9+-83.3&date1=2011,2012,2013&flags=build_txt:>')
>> # This opens the URL and creates a link to machine-readable data on the
>> page, which I can then download by simply doing this:
>>
>> #2
>> read.delim('http://pick18.discoverlife.org/tmp/Hypoprepia_
>> fucosa_33.9_-83.3_2011,2012,2013.txt
>> <http://pick18.discoverlife.org/tmp/Hypoprepia_fucosa_33.9_-
>> 83.3_2011,2012,2013.txt>')
>> #This is what I need to read in terms of data, but this URL only exists
>> if the URL ran above is activated first
>>
>> So, for example, try running line #2 without the first line- it won't
>> work. Next run #1 then #2- works fine.
>>
>> See what I mean?
>>
>>
>> On Thu, Sep 29, 2016 at 5:09 PM, Bob Rudis <bob at rud.is
>> <mailto:bob at rud.is>> wrote:
>>
>> The rvest/httr/curl trio can do the cookie management pretty well.
>> Make the initial connection via rvest::html_session() and then
>> hopefully be able to use other rvest function calls, but curl and
>> httr calls will use the cached in-memory handle info seamlessly.
>> You'd need to store and retrieve cookies if you need them preserved
>> between R sessions.
>>
>> Failing the above and assuming this would not need to be lightning
>> fast, use the phantomjs or firefox web driver (either with RSelenium
>> or some new stuff rOpenSci is cooking up) which will then do what
>> browsers do best and maintain all this state for you. You can still
>> slurp the page contents up with xml2::read_html() and use the super
>> handy processing idioms in the scraping tidyverse (it needs it's own
>> name).
>>
>> A concrete example (assuming the URLs aren't sensitive) would enable
>> me or someone else to mock up something for you.
>>
>>
>> On Thu, Sep 29, 2016 at 4:59 PM, Duncan Murdoch
>> <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>> wrote:
>>
>> On 29/09/2016 3:29 PM, Ryan Utz wrote:
>>
>> Hi all,
>>
>> I've got a situation that involves activating a URL so that
>> a link to some
>> data becomes available for download. I can easily use
>> 'browseURL' to do so,
>> but I'm hoping to make this batch-process-able, and I would
>> prefer to not
>> have 100s of browser windows open when I go to download
>> multiple data sets.
>>
>> Here's the example:
>>
>> #1
>> browseURL('
>> http://pick18.discoverlife.org/mp/20m?plot=2&kind=Hypoprepia
>> +fucosa&site=33.9+-83.3&date1=2011,2012,2013&flags=build_txt
>> <http://pick18.discoverlife.org/mp/20m?plot=2&kind=Hypoprepi
>> a+fucosa&site=33.9+-83.3&date1=2011,2012,2013&flags=build_txt>:
>> ')
>> # This opens the URL and creates a link to machine-readable
>> data on the
>> page, which I can then download by simply doing this:
>>
>> #2
>> read.delim('
>> http://pick18.discoverlife.org/tmp/Hypoprepia_fucosa_33.9_-
>> 83.3_2011,2012,2013.txt
>> <http://pick18.discoverlife.org/tmp/Hypoprepia_fucosa_33.9_-
>> 83.3_2011,2012,2013.txt>
>> ')
>>
>> However, I can only get the second line above to work if the
>> thing in line
>> #1 has been opened in a browser already. Is there any way to
>> allow me to
>> either 1) close the browser after it's been opened or 2)
>> execute the line
>> #2 above without having to open a browser? We have hundreds
>> of species that
>> you can see after the '&kind=' bit of the URL, so I'm trying
>> to keep the
>> browsing situation sane.
>>
>> Thanks!
>> R
>>
>>
>> You'll need to figure out what happens when you open the first
>> page. Does it set a cookie? Does it record your IP address?
>> Does it just build the file but record nothing about you?
>>
>> If it's one of the simpler versions, you can just read the first
>> page, wait a bit, then read the second one.
>>
>> If you need to manage cookies, you'll need something more
>> complicated. I don't know the easiest way to do that.
>>
>> Duncan Murdoch
>>
>>
>> ______________________________________________
>> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>> -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> <https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>>
>> --
>>
>> Ryan Utz, Ph.D.
>> Assistant professor of water resources
>> *chatham**UNIVERSITY*
>> Home/Cell: (724) 272-7769
>>
>>
>
--
Ryan Utz, Ph.D.
Assistant professor of water resources
*chatham**UNIVERSITY*
Home/Cell: (724) 272-7769
[[alternative HTML version deleted]]
More information about the R-help
mailing list