[Rd] as.Date (and strptime?) does not recognize " " as a blank
g@bembecker @end|ng |rom gm@||@com
Thu Jul 7 19:42:34 CEST 2022
Depends a bit on what you mean by "automatically". This seems to work for
me (note this has NOT been extensively tested on different OSes or even in
myhtml <- "<html><body><table
doc <- htmlParse(myhtml, asText = TRUE)
oldway <- readHTMLTable(doc, trim = FALSE)
identical(oldway$hiya$colname, oldway$hiya$colname) # FALSE :(
decode_nbsp <- function(x) gsub(rawToChar(as.raw(c(0xc2, 0xa0))), " ", x,
fixed = TRUE, useBytes = TRUE)
fancypants <- function(node) decode_nbsp(xmlValue(node))
newandfancy <- readHTMLTable(doc, trim = FALSE, elFun = fancypants)
identical(newandfancy$hiya$colname, newandfancy$hiya$colname) # TRUE
On Fri, Jun 24, 2022 at 11:48 PM Spencer Graves <spencer.graves using prodsyse.com>
> p.s. Is there a way to get XML::readHTMLTable to automatically convert
> " " to a normal blank space?
> On 6/25/22 1:37 AM, Spencer Graves wrote:
> > Hello, All:
> > When is a space not a space?
> > Consider the following:
> > > (pblmDate <- textutils::HTMLdecode(" 2 Mar 2018"))
> >  " 2 Mar 2018"
> > > as.Date(pblmDate, format='%e %b %Y')
> >  NA
> > > as.Date(' 2 Mar 2018', format='%e %b %Y')
> >  "2018-03-02"
> > Is this a feature or a bug?
> > I can work around it, now that I know what it is, but it took me
> > a few hours to diagnose.
> > Thanks,
> > Spencer Graves
> > p.s. I got this from scraping a website with code that had worked for
> > me roughly 20 months ago. I suspect that in the interim, someone
> > probably replaced ' 2 Mar 2018' with " 2 Mar 2018".
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> R-devel using r-project.org mailing list
[[alternative HTML version deleted]]
More information about the R-devel