[R] Stack overflow in R 2.10.0 with sub()

Kenneth Roy Cabrera Torres krcabrer at une.net.co
Tue Oct 27 19:16:21 CET 2009


El mar, 27-10-2009 a las 10:47 -0700, Phil Spector escribió:
> What happens if you type
> 
> Sys.setlocale('LC_ALL','C')
> 
> before using gsub or grep?

When I do that, R hangs and  don't show any message.
> 
>  					- Phil Spector
>  					 Statistical Computing Facility
>  					 Department of Statistics
>  					 UC Berkeley
>  					 spector at stat.berkeley.edu
> 
> 
> On Tue, 27 Oct 2009, Kenneth Roy Cabrera Torres wrote:
> 
> > Thank you very much for your interest.
> >
> > I make this:
> > x <- as.character(alumnos$AL_NUME_ID)
> > x <- x[-seq_len(length(x)/2)]
> > save(x, file="x.RData")
> >
> > I exit form R, and then restart R and I make this:
> >
> > load("x.RData")
> > y <- gsub("(^ +)|( +$)","",x)
> >
> > It shows me:
> >
> > Error en gsub("(^ +)|( +$)", "", x) :
> >  input string 66644 is invalid in this locale
> >
> > I delete that string (it is a string with a non usual character (Ñ))
> >
> > So, I retype without that observation.
> >
> > y <- gsub("(^ +)|( +$)","",x[-c(66644)])
> >
> > I got this:
> > Error en gsub("(^ +)|( +$)", "", x[-c(66644)]) :
> >  input string 160689 is invalid in this locale
> >
> > I retype again with this invalid string this way (I use the
> >  160690 position, because the lag of the x vector)
> >
> >> y <- gsub("(^ +)|( +$)","",x[-c(66644,160690)])
> > Error: C produce desborde de pila en 'segfault'
> >
> > And it fails.
> >
> > I also repeat all the process with this conversion first.
> >
> > x <- iconv(as.character(alumnos$AL_NUME_ID),"latin1","UTF-8")
> > x <- x[-seq_len(length(x)/2)]
> > save(x, file="x.RData")
> >
> > And I exit, and restart R, and then I type
> >
> > load("x.RData")
> > y <- gsub("(^ +)|( +$)","",x)
> >
> > And it fails again without showing me the "invalid string" errors.
> >
> > I then make this:
> >
> > load("x.RData")
> > y <- gsub("(^ +)|( +$)","",x[1:160690])
> >
> > and it works, then I type
> >
> > y <- gsub("(^ +)|( +$)","",x[1:200000]) #(x length is 454035)
> >
> > and it works...
> >
> > But I start to make a manual binary search,
> > I found something that stills puzzle me.
> >
> > y <- gsub("(^ +)|( +$)","",x[1:261570])
> >
> > works, but sometimes fails (after I restart R),
> > it always fails with index greather than 262000.
> >
> > I see that there are not something inusual arround 261570.
> >
> > x[261560:261580]
> > [1] "21444777             " "1147585              " "255202522
> > "
> > [4] "25852100             " "24258550             " "A8D0251207
> > "
> > [7] "34681811             " "19121345             " "16921329
> > "
> > [10] "20442195             " "14506482             " "44332211
> > "
> > [13] "35049122             " "34326340             " "35182366
> > "
> > [16] "33288742             " "34958795             " "1017147202
> > "
> > [19] "3306985              " "33048501             " "33295073
> > "
> >
> > I am sending you the x.Rdata file to see if you can
> > reproduce my problem.
> >
> > This infomation may be useful:
> >
> > sessionInfo()
> >
> > R version 2.10.0 (2009-10-26)
> > x86_64-unknown-linux-gnu
> >
> > locale:
> > [1] LC_CTYPE=es_CO.UTF-8       LC_NUMERIC=C
> > [3] LC_TIME=es_CO.UTF-8        LC_COLLATE=es_CO.UTF-8
> > [5] LC_MONETARY=C              LC_MESSAGES=es_CO.UTF-8
> > [7] LC_PAPER=es_CO.UTF-8       LC_NAME=C
> > [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=es_CO.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > R.Version()
> >
> > $platform
> > [1] "x86_64-unknown-linux-gnu"
> > $arch
> > [1] "x86_64"
> > $os
> > [1] "linux-gnu"
> > $system
> > [1] "x86_64, linux-gnu"
> > $status
> > [1] ""
> > $major
> > [1] "2"
> > $minor
> > [1] "10.0"
> > $year
> > [1] "2009"
> > $month
> > [1] "10"
> > $day
> > [1] "26"
> > $`svn rev`
> > [1] "50208"
> > $language
> > [1] "R"
> > $version.string
> > [1] "R version 2.10.0 (2009-10-26)"
> >
> > gcc --version and g++ --verision shows me:
> >
> > gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3
> > Copyright (C) 2008 Free Software Foundation, Inc.
> > Esto es software libre; vea el código para las condiciones de copia.  NO
> > hay
> > garantía; ni siquiera para MERCANTIBILIDAD o IDONEIDAD PARA UN PROPÓSITO
> > EN
> > PARTICULAR
> >
> > When I compile R I use this option in configuration (nothing more)
> >
> > ./configure --enable-R-shlib
> > make
> > sudo make install
> >
> > At the moment I have 22Gb of swap partition (keeping monitor tracking
> > the systems is not using it) and 4GB of RAM.
> >
> > Again, thank you very much for your help.
> >
> > Kenneth
> >
> >
> >
> >
> >
> >




More information about the R-help mailing list