[R] [External] Re: help with web scraping
Rasmus Liland
jr@| @end|ng |rom po@teo@no
Sun Jul 26 17:43:49 CEST 2020
Dear GRAVES et al.,
On 2020-07-25 12:43 -0500, Spencer Graves wrote:
> Dear Rasmus Liland et al.:
>
> On 2020-07-25 11:30, Rasmus Liland wrote:
> > On 2020-07-25 09:56 -0500, Spencer Graves wrote:
> > > Dear Rasmus et al.:
> >
> > It is LILAND et al., is it not? ... else it's customary to
> > put a comma in there, isn't it? ...
>
> The APA Style recommends "Sharp et al., 2007":
>
> https://blog.apastyle.org/apastyle/2011/11/the-proper-use-of-et-al-in-apa-style.html
If "Sharp et al., 2007" is an APA
citation of this book[*], Sharp is John A
Sharp's surname, Liland is my surname.
Q.E.D.
I have not used APA before (as I am not
a Psychiatrist), as the minimalism of
IEEE[**] always seemed more desirable.
> Regarding Confucius, I'm confused.
Nevermind, just fooling around, that's
all.
> > On 2020-07-25 04:10, Rasmus Liland wrote:
> > >
> > > However, this suppressed "<br/>"
> > > everywhere.?
> >
> > Why is that, please explain.
>
> I don't know why the Missouri
> Secretary of State's web site includes
> "<br/>" to signal a new line, but it
> does.
Me neither! On top of that, <br /> is
actually[***] an XHTML tag, not an HTML
tag.
> I also don't know why
> XML::readHTMLTable suppressed "<br/>"
> everywhere it occurred, but it did
> that.
Yes, I know, I also observed this. But
now we swiftly solved this by gsubbig it
with the newline char, "\n", which does
not make sense for HTML parses anyway.
> > > If you aren't aware of one, I can
> > > gsub("<br/>", "\n", ...) on the string
> > > for each political office before
> > > passing it to "XML::readHTMLTable".? I
> > > just tested this:? It works.
> >
> > Such a great hack! IMHO, this is much
> > more flexible than using
> > xml2::read_html, rvest::read_table,
> > dplyr::mutate like here[1]
> >
> > [1] https://stackoverflow.com/questions/38707669/how-to-read-an-html-table-and-account-for-line-breaks-within-cells
>
> And I added my solution to this
> problem to this Stackoverflow thread.
I wish you many upvotes, alas the
political competition is obiously not
tough there, as the other guy just got
one down vote.
[*] https://www.amazon.co.uk/Management-Student-Research-Project/dp/0566084902
[**] https://pitt.libguides.com/citationhelp/ieee
[***] https://stackoverflow.com/questions/1946426/html-5-is-it-br-br-or-br
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200726/138ec8c5/attachment.sig>
More information about the R-help
mailing list