[R] [External] Re: help with web scraping
jr@| @end|ng |rom po@teo@no
Sun Jul 26 17:43:49 CEST 2020
Dear GRAVES et al.,
On 2020-07-25 12:43 -0500, Spencer Graves wrote:
> Dear Rasmus Liland et al.:
> On 2020-07-25 11:30, Rasmus Liland wrote:
> > On 2020-07-25 09:56 -0500, Spencer Graves wrote:
> > > Dear Rasmus et al.:
> > It is LILAND et al., is it not? ... else it's customary to
> > put a comma in there, isn't it? ...
> The APA Style recommends "Sharp et al., 2007":
If "Sharp et al., 2007" is an APA
citation of this book[*], Sharp is John A
Sharp's surname, Liland is my surname.
I have not used APA before (as I am not
a Psychiatrist), as the minimalism of
IEEE[**] always seemed more desirable.
> Regarding Confucius, I'm confused.
Nevermind, just fooling around, that's
> > On 2020-07-25 04:10, Rasmus Liland wrote:
> > >
> > > However, this suppressed "<br/>"
> > > everywhere.?
> > Why is that, please explain.
> I don't know why the Missouri
> Secretary of State's web site includes
> "<br/>" to signal a new line, but it
Me neither! On top of that, <br /> is
actually[***] an XHTML tag, not an HTML
> I also don't know why
> XML::readHTMLTable suppressed "<br/>"
> everywhere it occurred, but it did
Yes, I know, I also observed this. But
now we swiftly solved this by gsubbig it
with the newline char, "\n", which does
not make sense for HTML parses anyway.
> > > If you aren't aware of one, I can
> > > gsub("<br/>", "\n", ...) on the string
> > > for each political office before
> > > passing it to "XML::readHTMLTable".? I
> > > just tested this:? It works.
> > Such a great hack! IMHO, this is much
> > more flexible than using
> > xml2::read_html, rvest::read_table,
> > dplyr::mutate like here
> >  https://stackoverflow.com/questions/38707669/how-to-read-an-html-table-and-account-for-line-breaks-within-cells
> And I added my solution to this
> problem to this Stackoverflow thread.
I wish you many upvotes, alas the
political competition is obiously not
tough there, as the other guy just got
one down vote.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 833 bytes
Desc: not available
More information about the R-help