[R] help with web scraping
Spencer Graves
@pencer@gr@ve@ @end|ng |rom e||ect|vede|en@e@org
Thu Jul 23 23:49:11 CEST 2020
Hello, All:
I've failed with multiple attempts to scrape the table of
candidates from the website of the Missouri Secretary of State:
https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975
I've tried base::url, base::readLines, xml2::read_html, and
XML::readHTMLTable; see summary below.
Suggestions?
Thanks,
Spencer Graves
sosURL <-
"https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975"
str(baseURL <- base::url(sosURL))
# this might give me something, but I don't know what
sosRead <- base::readLines(sosURL) # 404 Not Found
sosRb <- base::readLines(baseURL) # 404 Not Found
sosXml2 <- xml2::read_html(sosURL) # HTTP error 404.
sosXML <- XML::readHTMLTable(sosURL)
# List of 0; does not seem to be XML
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.5
Matrix products: default
BLAS:
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
loaded via a namespace (and not attached):
[1] compiler_4.0.2 tools_4.0.2 curl_4.3
[4] xml2_1.3.2 XML_3.99-0.3
More information about the R-help
mailing list