[R] prevent XML::readHTMLTable from suppressing <br/>

Spencer Graves @pencer@gr@ve@ @end|ng |rom e||ect|vede|en@e@org
Sat Jul 25 05:59:55 CEST 2020


Hello, All:


       Thanks to Rasmus Liland, William Michels, and Luke Tierney with 
my earlier web scraping question.  With their help, I've made progress.  
Sadly, I still have a problem:  One field has "<br/>", which gets 
suppressed by XML::readHTMLTable:


sosURL <- 
"https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975"
sosChars <- RCurl::getURL(sosURL)
MOcan <- XML::readHTMLTable(sosChars)
MOcan[[2]][1, 2]
[1] "4476 FIVE MILE RDSENECA MO 64865"


(Seneca <- regexpr('SENECA', sosChars))
substring(sosChars, Seneca-22, Seneca+14)


[1] "4476 FIVE MILE RD<br/>SENECA MO 64865"


       How can I get essentially the same result but without having 
XML::readHTMLTable suppress "<br/>"?


NOTE:  I get something very similar with xml2::read_html and 
rvest::html_table:


sosPointers <- xml2::read_html(sosChars)
MOcan2 <- rvest::html_table(sosPointers)
MOcan2[[2]][1, 2]
[1] "4476 FIVE MILE RDSENECA MO 64865"


       MOcan2 does not have names, and some of the fields are 
automatically converted to integers, which I think is not smart in this 
application.


       Thanks,
       Spencer Graves



More information about the R-help mailing list