[R-pkg-devel] Windows R 4.2.0 package will not load with UTF-8 encoding

Joseph Park jo@ephp@rk @end|ng |rom |eee@org
Sat Jun 11 20:19:49 CEST 2022


It looks like Hiroaki identified the issue.

When the C++ std::regex code is removed from the underlying API, the
problem seems solved. Thank you!

The symptoms observed match those described in the tesseract issue thread.
The solution outlined in the gcc bug report seems the most prudent course:
Don't use std::regex.  I'll work on that and see if it resolves the issue.

Comment #6 seems relevant:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98723#c6

Again: Thank you!

On Sat, Jun 11, 2022 at 8:49 AM Duncan Murdoch <murdoch.duncan using gmail.com>
wrote:

> On 11/06/2022 6:43 a.m., Joseph Park wrote:
> > Thank you for the check of the CRAN builds.  I also checked that as a
> first
> > step.  Perhaps there is some difference between the CRAN setups, as I
> have
> > reproduced this on 3 Windows 10 machines with clean installs of R 4.2.0,
> > and it has been reported by other users.  I also noted in the post that
> > building and installing via devtools reports success (  ** testing if
> > installed package can be loaded from temporary location ), however, a
> > subsequent attempt to load hangs.
>
> One possible difference is the version of Windows 10.  The UTF8 handling
> was described in the NEWS file this way:
>
> "R uses UTF-8 as the native encoding on recent Windows systems (at least
> Windows 10 version 1903, Windows Server 2022 or Windows Server 1903). As
> a part of this change, R uses UCRT as the C runtime. UCRT should be
> installed manually on systems older than Windows 10 or Windows Server
> 2016 before installing R."
>
> Conceivably the systems where this fails don't have the new UCRT
> runtime.  I believe running Windows Update should get it.
>
> If it doesn't, or for users on an older Windows version, this page lets
> you download it:
> https://www.microsoft.com/en-us/download/details.aspx?id=48234 .
>
>
> Duncan Murdoch
>
> >
> > On Sat, Jun 11, 2022 at 6:33 AM Joseph Park <josephpark using ieee.org> wrote:
> >
> >> Apologies for the pages of minutia.  I endeavored to post reproduceable
> >> example. I'm unable to show the failure since it simply hangs at the
> prompt
> >> with CPU spinning and memory cyclically ramping and declining.  One has
> to
> >> kill R. The posted commands show the workaround, not the failure.
> >>
> >> I since found that just changing the LC_COLLATE is enough to allow the
> >> library to load :
> >>> Sys.setlocale('LC_COLLATE','English')
> >> [1] "English_United States.1252"
> >>> Sys.getlocale()
> >> [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> >> States.utf8;LC_MONETARY=English_United
> >> States.utf8;LC_NUMERIC=C;LC_TIME=English_United States.utf8"
> >>
> >> Again, apologies for my naivety.
> >>
> >> On Sat, Jun 11, 2022 at 6:16 AM Duncan Murdoch <
> murdoch.duncan using gmail.com>
> >> wrote:
> >>
> >>> On 11/06/2022 5:02 a.m., Joseph Park wrote:
> >>>> Dear R package developers,
> >>>>
> >>>> Starting with R 4.2.0 package rEDM (
> >>> https://cran.r-project.org/package=rEDM)
> >>>> will not load [library( rEDM )] on Windows with the default UTF-8
> >>> encoding.
> >>>>
> >>>> When the locale is changed from UTF-8 to non UTF-8, the package loads
> >>> and
> >>>> runs. One can also change the locale to non-UTF-8, load the package,
> >>> detach
> >>>> and unload the package, change the locale back to UTF-8, then load and
> >>> run
> >>>> without issue.
> >>>>
> >>>> Note that installation from source reports:
> >>>>      ** testing if installed package can be loaded from temporary
> >>> location
> >>>> and completes (record below).
> >>>>
> >>>> This package uses Rcpp to wrap a C++ API.
> >>>>
> >>>> Having searched here and in general, I don't find that others
> >>> experiencing
> >>>> this issue.
> >>>>
> >>>> I have tried
> >>>>     Ensure all source files are UTF-8 encoded
> >>>>     Removed non-ASCII characters from all source files
> >>>>     Specify non-ASCII characters with \uXXXX
> >>>>     Checked vignette encoding
> >>>>     Added "Encoding : UTF-8" to DESCRIPTION
> >>>>
> >>>> Please excuse my encoding and Windows naivety.
> >>>>
> >>>> Here is a demonstration changing the encoding to load the package,
> along
> >>>> with unloading & reloading under UTF-8:
> >>>> --
> >>>>> sessionInfo()
> >>>> R version 4.2.0 (2022-04-22 ucrt)
> >>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
> >>>> Running under: Windows 10 x64 (build 19044)
> >>>>
> >>>> Matrix products: default
> >>>>
> >>>> locale:
> >>>> [1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United
> >>>> States.utf8
> >>>> [3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
> >>>>
> >>>> [5] LC_TIME=English_United States.utf8
> >>>>
> >>>> attached base packages:
> >>>> [1] stats     graphics  grDevices utils     datasets  methods   base
> >>>>
> >>>> loaded via a namespace (and not attached):
> >>>> [1] compiler_4.2.0
> >>>>>
> >>>>> Sys.setlocale('LC_ALL','English')
> >>>> [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> >>>> States.1252;LC_MONETARY=English_United
> >>>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
> >>>> Warning message:
> >>>> In Sys.setlocale("LC_ALL", "English") :
> >>>>     using locale code page other than 65001 ("UTF-8") may cause
> problems
> >>>>>
> >>>>> sessionInfo()
> >>>> R version 4.2.0 (2022-04-22 ucrt)
> >>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
> >>>> Running under: Windows 10 x64 (build 19044)
> >>>>
> >>>> Matrix products: default
> >>>>
> >>>> locale:
> >>>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> >>>> States.1252
> >>>> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> >>>>
> >>>> [5] LC_TIME=English_United States.1252
> >>>> system code page: 65001
> >>>>
> >>>> attached base packages:
> >>>> [1] stats     graphics  grDevices utils     datasets  methods   base
> >>>>
> >>>> loaded via a namespace (and not attached):
> >>>> [1] compiler_4.2.0
> >>>>>
> >>>>> library( rEDM )
> >>>>>
> >>>>> sessionInfo()
> >>>> R version 4.2.0 (2022-04-22 ucrt)
> >>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
> >>>> Running under: Windows 10 x64 (build 19044)
> >>>>
> >>>> Matrix products: default
> >>>>
> >>>> locale:
> >>>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> >>>> States.1252
> >>>> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> >>>>
> >>>> [5] LC_TIME=English_United States.1252
> >>>> system code page: 65001
> >>>>
> >>>> attached base packages:
> >>>> [1] stats     graphics  grDevices utils     datasets  methods   base
> >>>>
> >>>> other attached packages:
> >>>> [1] rEDM_1.12.2.1.0
> >>>>
> >>>> loaded via a namespace (and not attached):
> >>>> [1] compiler_4.2.0 Rcpp_1.0.8.3
> >>>>>
> >>>>
> >>>> ### All package tests pass....
> >>>> ### Now detach and unload, change to UTF-8, and load
> >>>>
> >>>>> detach( 'package:rEDM', unload = TRUE )
> >>>>>
> >>>>> Simplex( dataFrame = Lorenz5D, columns = 'V1', target = 'V2', lib =
> "1
> >>>> 500", pred = "501 505", E = 5 )
> >>>> Error in Simplex(dataFrame = Lorenz5D, columns = "V1", target =
> "V2",  :
> >>>>     could not find function "Simplex"
> >>>
> >>> I don't see any attempt to load the package.  You attempted to use the
> >>> function Simplex and it was not found.  That indicates the package is
> >>> not loaded, but not why.
> >>>
> >>> What you should show are the messages you get when you start a clean
> >>> copy of R and immediately attempt to load the package using library().
> >>> It's helpful that you posted sessionInfo(); I'd include that again with
> >>> the new information, in case anything is different.
> >>>
> >>> Duncan Murdoch
> >>>
> >>>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-package-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
>

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list