[R] prevent XML::readHTMLTable from suppressing <br/>
Spencer Graves
@pencer@gr@ve@ @end|ng |rom e||ect|vede|en@e@org
Sat Jul 25 05:59:55 CEST 2020
Hello, All:
Thanks to Rasmus Liland, William Michels, and Luke Tierney with
my earlier web scraping question. With their help, I've made progress.
Sadly, I still have a problem: One field has "<br/>", which gets
suppressed by XML::readHTMLTable:
sosURL <-
"https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975"
sosChars <- RCurl::getURL(sosURL)
MOcan <- XML::readHTMLTable(sosChars)
MOcan[[2]][1, 2]
[1] "4476 FIVE MILE RDSENECA MO 64865"
(Seneca <- regexpr('SENECA', sosChars))
substring(sosChars, Seneca-22, Seneca+14)
[1] "4476 FIVE MILE RD<br/>SENECA MO 64865"
How can I get essentially the same result but without having
XML::readHTMLTable suppress "<br/>"?
NOTE: I get something very similar with xml2::read_html and
rvest::html_table:
sosPointers <- xml2::read_html(sosChars)
MOcan2 <- rvest::html_table(sosPointers)
MOcan2[[2]][1, 2]
[1] "4476 FIVE MILE RDSENECA MO 64865"
MOcan2 does not have names, and some of the fields are
automatically converted to integers, which I think is not smart in this
application.
Thanks,
Spencer Graves
More information about the R-help
mailing list