[Rd] [R] How to access https page
Jeroen Ooms
jeroen.ooms at stat.ucla.edu
Tue Mar 10 21:32:22 CET 2015
On Tue, Mar 10, 2015 at 12:56 PM, Hui <hui.du at savvyrookies.com> wrote:
> Thanks. However I got http error 999.
>
There is an additional complication here that linkedin doesn't want you to
scrape the website and denies requests form non-browser clients. To get
around this you need to set the "User-Agent" header to something that looks
like a browser. Try this:
devtools::install_github("jeroenooms/curl")
h <- new_handle()
handle_setheaders(h, "User-Agent" = "Mozilla/5.0 (Windows NT 6.3; rv:36.0)
Gecko/20100101 Firefox/36.0")
txt <- readLines(curl("https://www.linkedin.com/in/huidu", handle = h))
>
> Hui
>
> Sent from my iPhone
>
> On Mar 10, 2015, at 12:07 PM, Jeroen Ooms <jeroen.ooms at stat.ucla.edu>
> wrote:
>
>
>
> On Mon, Mar 9, 2015 at 3:39 PM, Hui Du <hui.du at savvyrookies.com> wrote:
>
>> > readLines(url)
>> Error in file(con, "r") : cannot open the connection
>> In addition: Warning message:
>> In file(con, "r") : unsupported URL scheme
>>
>
> Try:
>
> library(curl)
> readLines(curl(url))
>
>
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list