[Rd] Antwort: Re: R on Windows crashes when using certain

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Dec 15 17:09:32 CET 2009


A few comments, though (I've been offline through much of this, 
and away from a Windows machine for almost all).

1) You could have narrowed down the cause by saving and restarting the 
session.  In particular it would have shown that the issue was not in 
sub() as you reported, since saving the object after the sub() call 
and starting a new session caused the problem in the second session.

2) Using gctorture() makes such things happen on much smaller problems 
and more reliably (if no faster).  (The underlying cause was more than 
one missing PROTECT.)

3) The difference between fixed=TRUE (which you should have used in 
the first place) and the extended and PCRE versions is often in 2.10.x 
in the encoding of the result: use Encoding() to find out.  Not only 
is fixed = TRUE much faster, it avoids repeated re-encodings.

4) Using UTF-8 encoded strings in a non-UTF-8 locale (and in 
particular on Windows) is a convenience but has performance 
implications.  Unless you need text not representable in the current 
locale, convert your strings to the current charset.  If you are using 
non-ASCII text and an 8-bit locale (e.g. CP1252 on Windows) then 
regexp computations will work somewhat faster in R-devel since they 
are performed in bytes (whereas 2.10.x uses wchar_t and for [g]sub 
returns the result in UTF-8).

5) These reports show yet again that people are not doing enough to 
help in the alpha/beta testing period of 2.x.0.  The R developers are 
almost exclusively using ASCII data or UTF-8 locales, so people doing 
extensive text processing in other locales please do take note of 
requests to test new versions of R.


On Tue, 15 Dec 2009, g.russell at eos-solutions.com wrote:

> The new version of R-devel from yesterday morning seems to have fixed bug=20
> 14114! Thanks a lot for your help.
>
> Duncan Murdoch <murdoch at stats.uwo.ca> schrieb am 14.12.2009 13:34:35:
>
>> On 10/12/2009 4:20 AM, karl at huftis.org wrote:
>>> Full=5FName: Karl Ove Hufthammer
>>> Version: 2.10.0
>>> OS: Windows XP
>>> Submission from: (NULL) (93.124.134.66)
>>> =20
>>> =20
>>> I have found a rather strange bug in R 2.10.0 on Windows, where=20
>> the choice of
>>> characters used in a string make R crash (i.e., Windows shows a=20
>> dialogue saying
>>> that the application has a problem, and must be closed).
>> =20
>> This was related to encoding changes.  It likely appeared=20
>> Windows-specific because Windows uses a different default encoding than=20
>> most Linux systems.  I believe it is fixed now in R-devel, and it will=20
>> soon make it into 2.10.1-patched, but it came too late to make it into=20
>> today's release.
>> =20
>> I believe PR#14114 was the same issue and is also fixed, but I did less=20
>> testing of it.  I'd appreciate it if those who saw either bug in real=20
>> code test the patches.  They should be in today's tarball of R-devel,=20
>> and did make it into the Windows binary build of R-devel this morning.
>> =20
>> Duncan
>> =20
>>> =20
>>> I can reproduce the bug on two separate systems running Windows XP,=20
> and with
>>> both R 2.10.0 and the latest R.2.10.1 RC.
>>> =20
>>> The following commands trigger the crash for me:
>>> =20
>>> n=3D1e5
>>> k=3D10
>>> x=3Dsample(k,n,replace=3DTRUE)
>>> y=3Dsample(k,n,replace=3DTRUE)
>>> xy=3Dpaste(x,y,sep=3D" =D7 ")
>>> z=3Dsample(n)
>>> d=3Ddata.frame(xy,z)
>>> =20
>>> The last step takes very long time, and R crashes before it's=20
> finished. Note
>>> that if I reduce n, the problem disappears. Also, if I change the =D7 (a
>>> multiplication symbol) to a x (a letter), the problem also=20
>> disappears (and the
>>> last command takes almost no time to run).
>>> =20
>>> I originally discovered this (or a related?) bug while using=20
>> 'unique' on a data
>>> frame similar to the 'd' data frame defined above, where R would=20
>> often, but not
>>> always, crash.=20
>>> =20
>>>> sessionInfo()
>>> R version 2.10.0 (2009-10-26)=20
>>> i386-pc-mingw32=20
>>> =20
>>> locale:
>>> [1] LC=5FCOLLATE=3DNorwegian-Nynorsk=5FNorway.1252=20
>>> [2] LC=5FCTYPE=3DNorwegian-Nynorsk=5FNorway.1252=20
>>> [3] LC=5FMONETARY=3DNorwegian-Nynorsk=5FNorway.1252
>>> [4] LC=5FNUMERIC=3DC=20
>>> [5] LC=5FTIME=3DNorwegian-Nynorsk=5FNorway.1252=20
>>> =20
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>> =20
>>> =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=
> =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> =20
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list