[Rd] Antwort: Re: R on Windows crashes when using certain
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Dec 15 17:09:32 CET 2009
A few comments, though (I've been offline through much of this,
and away from a Windows machine for almost all).
1) You could have narrowed down the cause by saving and restarting the
session. In particular it would have shown that the issue was not in
sub() as you reported, since saving the object after the sub() call
and starting a new session caused the problem in the second session.
2) Using gctorture() makes such things happen on much smaller problems
and more reliably (if no faster). (The underlying cause was more than
one missing PROTECT.)
3) The difference between fixed=TRUE (which you should have used in
the first place) and the extended and PCRE versions is often in 2.10.x
in the encoding of the result: use Encoding() to find out. Not only
is fixed = TRUE much faster, it avoids repeated re-encodings.
4) Using UTF-8 encoded strings in a non-UTF-8 locale (and in
particular on Windows) is a convenience but has performance
implications. Unless you need text not representable in the current
locale, convert your strings to the current charset. If you are using
non-ASCII text and an 8-bit locale (e.g. CP1252 on Windows) then
regexp computations will work somewhat faster in R-devel since they
are performed in bytes (whereas 2.10.x uses wchar_t and for [g]sub
returns the result in UTF-8).
5) These reports show yet again that people are not doing enough to
help in the alpha/beta testing period of 2.x.0. The R developers are
almost exclusively using ASCII data or UTF-8 locales, so people doing
extensive text processing in other locales please do take note of
requests to test new versions of R.
On Tue, 15 Dec 2009, g.russell at eos-solutions.com wrote:
> The new version of R-devel from yesterday morning seems to have fixed bug=20
> 14114! Thanks a lot for your help.
>
> Duncan Murdoch <murdoch at stats.uwo.ca> schrieb am 14.12.2009 13:34:35:
>
>> On 10/12/2009 4:20 AM, karl at huftis.org wrote:
>>> Full=5FName: Karl Ove Hufthammer
>>> Version: 2.10.0
>>> OS: Windows XP
>>> Submission from: (NULL) (93.124.134.66)
>>> =20
>>> =20
>>> I have found a rather strange bug in R 2.10.0 on Windows, where=20
>> the choice of
>>> characters used in a string make R crash (i.e., Windows shows a=20
>> dialogue saying
>>> that the application has a problem, and must be closed).
>> =20
>> This was related to encoding changes. It likely appeared=20
>> Windows-specific because Windows uses a different default encoding than=20
>> most Linux systems. I believe it is fixed now in R-devel, and it will=20
>> soon make it into 2.10.1-patched, but it came too late to make it into=20
>> today's release.
>> =20
>> I believe PR#14114 was the same issue and is also fixed, but I did less=20
>> testing of it. I'd appreciate it if those who saw either bug in real=20
>> code test the patches. They should be in today's tarball of R-devel,=20
>> and did make it into the Windows binary build of R-devel this morning.
>> =20
>> Duncan
>> =20
>>> =20
>>> I can reproduce the bug on two separate systems running Windows XP,=20
> and with
>>> both R 2.10.0 and the latest R.2.10.1 RC.
>>> =20
>>> The following commands trigger the crash for me:
>>> =20
>>> n=3D1e5
>>> k=3D10
>>> x=3Dsample(k,n,replace=3DTRUE)
>>> y=3Dsample(k,n,replace=3DTRUE)
>>> xy=3Dpaste(x,y,sep=3D" =D7 ")
>>> z=3Dsample(n)
>>> d=3Ddata.frame(xy,z)
>>> =20
>>> The last step takes very long time, and R crashes before it's=20
> finished. Note
>>> that if I reduce n, the problem disappears. Also, if I change the =D7 (a
>>> multiplication symbol) to a x (a letter), the problem also=20
>> disappears (and the
>>> last command takes almost no time to run).
>>> =20
>>> I originally discovered this (or a related?) bug while using=20
>> 'unique' on a data
>>> frame similar to the 'd' data frame defined above, where R would=20
>> often, but not
>>> always, crash.=20
>>> =20
>>>> sessionInfo()
>>> R version 2.10.0 (2009-10-26)=20
>>> i386-pc-mingw32=20
>>> =20
>>> locale:
>>> [1] LC=5FCOLLATE=3DNorwegian-Nynorsk=5FNorway.1252=20
>>> [2] LC=5FCTYPE=3DNorwegian-Nynorsk=5FNorway.1252=20
>>> [3] LC=5FMONETARY=3DNorwegian-Nynorsk=5FNorway.1252
>>> [4] LC=5FNUMERIC=3DC=20
>>> [5] LC=5FTIME=3DNorwegian-Nynorsk=5FNorway.1252=20
>>> =20
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>> =20
>>> =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=
> =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> =20
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list