[R] Developing a web crawler / R "webkit" or something similar? [off topic]
Matt.Shotwell at Vanderbilt.Edu
Thu Mar 3 20:04:11 CET 2011
On 03/03/2011 08:07 AM, Mike Marchywka wrote:
>> Date: Thu, 3 Mar 2011 01:22:44 -0800
>> From: antujsrv at gmail.com
>> To: r-help at r-project.org
>> Subject: [R] Developing a web crawler
>> I wish to develop a web crawler in R. I have been using the functionalities
>> available under the RCurl package.
>> I am able to extract the html content of the site but i don't know how to go
> In general this can be a big effort but there may be things in
> However, I guess what I'd be looking for is something like a "webkit"
> package or other open source browser with or without an "R" interface.
> This actually may be an ideal solution for a lot of things as you get
> all the content handlers of at least some browser.
> Now that you mention it, I wonder if there are browser plugins to handle
> "R" content ( I'd have to give this some thought, put a script up as
> a web page with mime type "test/R" and have it execute it in R. )
There are server-side solutions for this sort of thing. See
http://rapache.net/ . Also, there was a string of messages on R-devel
some years ago addressing the mime type issue; beginning here:
http://tolstoy.newcastle.edu.au/R/devel/05/11/3054.html . Though I don't
know whether there was a resolution. Some suggestions were text/x-R,
>> about analyzing the html formatted document.
>> I wish to know the frequency of a word in the document. I am only acquainted
>> with analyzing data sets.
>> So how should i go about analyzing data that is not available in table
>> Few chunks of code that i wrote:
>> t<- readLines(w)
>> readLines also didnt prove out to be of any help.
>> Any help would be highly appreciated. Thanks in advance.
>> View this message in context: http://r.789695.n4.nabble.com/Developing-a-web-crawler-tp3332993p3332993.html
>> Sent from the R help mailing list archive at Nabble.com.
>> R-help at r-project.org mailing list
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Matthew S Shotwell Assistant Professor School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help