[R] Using CSS package to extract text from html

dstrick1 dstrick1 at vt.edu
Tue Jul 1 17:44:38 CEST 2014


This being my first post, I'm sure I'll do something discordant with
convention, so forgive me in advance.

Basically, I am trying to extract text from an html file using the CSS
package in R. However, I am unable to do so because it seems that the text
itself is not identified with any class and thus targeting it via the CSS
function `cssApply` is difficult. 

I'll provide some detailed information so that you may be able to spot
something I've missed. Let's say I want to extract the latitude/longitude
info from the following html:
http://va.water.usgs.gov/duration_plots/htm_7/dp02059500.htm 

Here's what the initial portion of my code would look like:

install.packages('CSS')

library(CSS)

doc<-"http://va.water.usgs.gov/duration_plots/htm_7/dp02059500.htm"

doc<-htmlParse(doc)


Now, considering that the text I want to extract is under the following
Xpath (c&p from Chrome DevTool):
/html/body/table[1]/tbody/tr/td/table/tbody/tr[2]/td[2]/font/text()[1]

Would the next move be to call the text from that path? If you need to see
for yourself how the site's html is configured follow the link and use your
respective browser's inspect element tool. 

Any help would be appreciated. Thanks.





--
View this message in context: http://r.789695.n4.nabble.com/Using-CSS-package-to-extract-text-from-html-tp4693347.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list