[R] Extracting a website text content using R
mtmorgan at fhcrc.org
mtmorgan at fhcrc.org
Thu Aug 2 04:08:07 CEST 2007
Perhaps more fun is
> library(XML)
> res = htmlTreeParse("http://www.omegahat.org/RSXML/", useInternalNodes=TRUE)
> xpathApply(res, "//h1", xmlValue)
[[1]]
[1] "An XML package for the S language"
Martin
Quoting Steven McKinney <smckinney at bccrc.ca>:
>
>
> >-----Original Message-----
> >From: r-help-bounces at stat.math.ethz.ch on behalf of Am Stat
> >Sent: Wed 8/1/2007 2:19 PM
> >To: r-help at stat.math.ethz.ch
> >Subject: [R] Extracting a website text content using R
>
> >Dear useR,
>
> >Just wandering whether it is possible that there is any function in R could
> >let me get the text contents for a certain website.
>
> >Thanks a lot!
>
> >Best,
>
> >Leon
>
>
>
>
> Is this what you had in mind?
>
> > foo <- scan(url("http://cran.r-project.org/"), what = "character")
> Read 69 items
> > paste(unlist(foo), collapse = " ")
> [1] "<!DOCTYPE HTML PUBLIC -//IETF//DTD HTML//EN > <html> <head> <title>The
> Comprehensive R Archive Network</title> <link rel=\"icon\"
> href=\"favicon.ico\" type=\"image/x-icon\"> <link rel=\"shortcut icon\"
> href=\"favicon.ico\" type=\"image/x-icon\"> <link rel=\"stylesheet\"
> type=\"text/css\" href=\"R.css\"> </head> <FRAMESET cols=\"1*, 4*\" border=0>
> <FRAMESET rows=\"120, 1*\"> <FRAME src=\"logo.html\" name=\"logo\"
> frameborder=0> <FRAME src=\"navbar.html\" name=\"contents\" frameborder=0>
> </FRAMESET> <FRAME src=\"banner.shtml\" name=\"banner\" frameborder=0>
> <noframes> <h1>The Comprehensive R Archive Network</h1> Your browser seems
> not to support frames, here is the <A href=\"navbar.html\">contents page</A>
> of CRAN. </noframes> </FRAMESET>"
>
>
> Try the search phrase
>
> cran scan url
>
> in Google for more hits on
> info about R functions that
> can deal with URLs.
>
> In R try
>
> > apropos("URL")
> [1] "contourLines" "URLdecode" "URLencode" "browseURL"
> "contrib.url" "main.help.url" "url.show"
> [8] "loadURL" "read.table.url" "scan.url" "source.url"
> "url"
>
>
> SteveM
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list