[Rd] as.Date (and strptime?) does not recognize "  " as a blank

Spencer Graves @pencer@gr@ve@ @end|ng |rom prod@y@e@com
Sat Jun 25 15:13:26 CEST 2022


Hi, Maxim et al.:


On 6/25/22 6:10 AM, Maxim Nazarov wrote:
> Hello,
> 
>> When is a space not a space?
> I guess the answer is when it is a non-breaking one?..
> 
> We can observe:
>   > charToRaw(textutils::HTMLdecode(" "))
>   [1] c2 a0
>   > charToRaw(" ")
>   [1] 20
> So one can argue that everything works correctly - `textutils` function converts HTML's non-breaking space ' ' into R's non-breaking space '\xa0', while %e format of as.Date expects a 'normal' space.
> But this is obviously not user-friendly especially since both symbols are displayed the same way on the console.
> So your options might be to either:
>   * manually change all 'weird' spaces into normal ones with something like gsub("\\h", " ", ..., perl = TRUE) - for the list of other weird spaces see https://www.pcre.org/original/doc/html/pcrepattern.html#genericchartypes
>   * persuade textutils author to change   into a normal space (they seem to be working with a simple lookup table - https://github.com/enricoschumann/textutils/blob/b813c7bd4b55daef5fa7612e3fbfe82962711940/R/char_refs.R#L1465-L1466)
>   * persuade R-Core (or submit a PR) to relax expectations of as.Date/strptime
> 

	  Thanks for the reply.  Since "this is obviously not user-friendly", 
as you noted, I felt a need to bring it to the attention of this group, 
and let them decide what if anything they would want to do about it.


	  In any event, I found a fix for my immediate problem.  It's not as 
elegant as yours, but it works.

	  Best Wishes,
	  Spencer




> Kind regards,
> Maxim Nazarov
> 
> ----- On Jun 25, 2022, at 8:37 AM, Spencer Graves spencer.graves using prodsyse.com wrote:
> 
>> Hello, All:
>>
>>
>> 	  When is a space not a space?
>>
>>
>> 	  Consider the following:
>>
>>
>>> (pblmDate <- textutils::HTMLdecode(" 2 Mar 2018"))
>> [1] " 2 Mar 2018"
>>> as.Date(pblmDate, format='%e %b %Y')
>> [1] NA
>>> as.Date(' 2 Mar 2018', format='%e %b %Y')
>> [1] "2018-03-02"
>>
>>
>> 	  Is this a feature or a bug?
>>
>>
>> 	  I can work around it, now that I know what it is, but it took me a
>> few hours to diagnose.
>>
>>
>> 	  Thanks,
>> 	  Spencer Graves
>>
>>
>> p.s.  I got this from scraping a website with code that had worked for
>> me roughly 20 months ago.  I suspect that in the interim, someone
>> probably replaced ' 2 Mar 2018' with " 2 Mar 2018".
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list