[R] parsing numeric values

Gabor Grothendieck ggrothendieck at gmail.com
Wed Nov 18 16:48:20 CET 2009


Thanks. This is now fixed in the development version so that it gives
an error rather than crashing:

> library(gsubfn)
Loading required package: proto
Loading required package: tcltk
Loading Tcl/Tk interface ... done
> source("http://gsubfn.googlecode.com/svn/trunk/R/gsubfn.R")
> strapply("test", as.numeric)
Error in as.character(pattern) :
  cannot coerce type 'builtin' to vector of type 'character'


On Wed, Nov 18, 2009 at 8:49 AM, baptiste auguie
<baptiste.auguie at googlemail.com> wrote:
> Thanks a lot, both of you.
>
> Incidentally, I made R crash when I forgot the X argument to strapply,
>
> library(gsubfn)
> Loading required package: tcltk
> Loading Tcl/Tk interface ... done
> strapply("test", as.numeric)
>
>  *** caught bus error ***
> address 0x13c, cause 'non-existent physical address'
>
> Traceback:
>  1: .External("dotTclcallback", ..., PACKAGE = "tcltk")
>  2: .Tcl.callback(x, e)
>  3: makeAtomicCallback(x, e)
>  4: makeCallback(get("value", envir = ref), get("envir", envir = ref))
>  5: FUN(X[[3L]], ...)
>  6: lapply(val, val2obj)
>  7: .Tcl.args.objv(...)
>  8: structure(.External("dotTclObjv", objv, PACKAGE = "tcltk"), class
> = "tclObj")
>  9: .Tcl.objv(.Tcl.args.objv(...))
> 10: tcl("set", "e", e)
> 11: strapply1(x, pattern, backref, ignore.case)
> 12: FUN("test"[[1L]], ...)
> 13: lapply(X, FUN, ...)
> 14: sapply(X, ff, simplify = is.logical(simplify) && simplify,
> USE.NAMES = USE.NAMES)
> 15: strapply("test", as.numeric)
>
> Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace
>
> sessionInfo()
> R version 2.10.0 (2009-10-26)
> i386-apple-darwin9.8.0
>
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  grid      methods
> [8] base
>
> other attached packages:
> [1] ggplot2_0.8.3  reshape_0.8.3  plyr_0.1.9     proto_0.3-8    fortunes_1.3-6
>
> 2009/11/18 Gabor Grothendieck <ggrothendieck at gmail.com>:
>> A minor variant might be the following:
>>
>>   library(gsubfn)
>>   strapply(input, "\\d+\\.\\d+E[-+]?\\d+", as.numeric, simplify = rbind)
>>
>> where:
>>
>> - as.numeric is used in place of c in which case we do not need combine
>> - \\d+ matches one or more digits
>> - \\. matches a decimal point
>> - [-+]? matches -, + or nothing (i.e. an optional sign).
>> - parentheses around the regular expression not needed
>>
>> On Wed, Nov 18, 2009 at 7:28 AM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
>>> Try this:
>>>
>>> strapply(input, "([0-9]+\\.[0-9]+E-[0-9]+)", c, simplify = rbind,
>>> combine = as.numeric)
>>>
>>> On Wed, Nov 18, 2009 at 9:57 AM, baptiste auguie
>>> <baptiste.auguie at googlemail.com> wrote:
>>>> Dear list,
>>>>
>>>> I'm seeking advice to extract some numeric values from a log file
>>>> created by an external program. Consider the following example,
>>>>
>>>> input <-
>>>> readLines(textConnection(
>>>> "some text
>>>>  <ax> =    1.3770E-03     <bx> =    3.4644E-07
>>>>  <ay> =    1.9412E-04     <by> =    4.8840E-08
>>>>
>>>> other text
>>>>  <aax>  =    1.3770E-03     <bbx> =    3.4644E-07
>>>>  <aay>  =    1.9412E-04     <bby> =    4.8840E-08"))
>>>>
>>>> ## this is what I want
>>>> results <- c(as.numeric(strsplit(grep("<ax>", input,val=T), " ")[[1]][8]),
>>>>             as.numeric(strsplit(grep("<ay>", input,val=T), " ")[[1]][8]),
>>>>             as.numeric(strsplit(grep("<aax>", input,val=T), " ")[[1]][9]),
>>>>             as.numeric(strsplit(grep("<aay>", input,val=T), " ")[[1]][9])
>>>>             )
>>>>
>>>> ## [1] 0.00137700 0.00019412 0.00137700 0.00019412
>>>>
>>>> The use of strsplit is not ideal here as there is a different number
>>>> of space characters in the lines containing <ax> and <aax> for
>>>> instance (hence the indices 8 and 9 respectively).
>>>>
>>>> I tried to use gsubfn for a cleaner construct,
>>>>
>>>> strapply(input, "<ax> += +([0-9.]+)", c, simplify=rbind,combine=as.numeric)
>>>>
>>>> but I can't seem to find the correct regular expression to deal with
>>>> the exponent.
>>>>
>>>>
>>>> Any tips are welcome!
>>>>
>>>>
>>>> Best regards,
>>>>
>>>> baptiste
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Henrique Dallazuanna
>>> Curitiba-Paraná-Brasil
>>> 25° 25' 40" S 49° 16' 22" O
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>




More information about the R-help mailing list