[R] parsing numeric values
Gabor Grothendieck
ggrothendieck at gmail.com
Wed Nov 18 20:14:15 CET 2009
Here is a slight variation:
> read.table(textConnection(grep("<aa?[xy]>", input, value = TRUE)),
+ colClasses = c("NULL", "NULL", "numeric"))
V3 V6
1 0.00137700 3.4644e-07
2 0.00019412 4.8840e-08
3 0.00137700 3.4644e-07
4 0.00019412 4.8840e-08
On Wed, Nov 18, 2009 at 1:54 PM, baptiste auguie
<baptiste.auguie at googlemail.com> wrote:
> Hi,
>
> Thanks for the alternative approach. However, I should have made my
> example more complete in that other lines may also have numeric
> values, which I'm not interested in. Below is an updated problem, with
> my current solution,
>
> tc <- textConnection(
> "some text
> <ax> = 1.3770E-03 <bx> = 3.4644E-07
> <ay> = 1.9412E-04 <by> = 4.8840E-08
>
> other text
> <aax> = 1.3770E-03 <bbx> = 3.4644E-07
> <aay> = 1.9412E-04 <bby> = 4.8840E-08
>
> lots of other material, including numeric values
> 1.23E-4 123E5 12.3E-4 123E5 123E-4 123E5
> 12.3E-4 123E5 12.3E-4 123E5 123E-4 123E5
> etc...")
>
> input <-
> readLines(tc)
> close(tc)
>
> ## I want to retrieve the values for
> ## <ax>, <ay>, <aax> and <aay> only
>
> results <- c(
> strapply(input, "<ax> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
> simplify = rbind),
> strapply(input, "<ay> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
> simplify = rbind),
> strapply(input, "<aax> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
> simplify = rbind),
> strapply(input, "<aay> += +(\\d+\\.\\d+E[-+]?\\d+)", as.numeric,
> simplify = rbind))
>
> results
>
> Using the suggested base R solution, I've come up with this variation,
>
> z <- `, grep("<ax>|<ay>|<aax>|<aay>", input,
> value=TRUE))
>
> test <- scan(textConnection(z),what=0)
> test[seq(1, length(test), by=2)]
>
>
> Thanks again,
>
> baptiste
>
> 2009/11/18 Bert Gunter <gunter.berton at gene.com>:
>> The previous elegant solutions required the use of the gsubfn package.
>> Nothing wrong with that, of course, but I'm always curious whether still
>> relatively simple base R solutions can be found, as they are often (but not
>> always!) much faster. And anyway, it seems to be in the spirit of your query
>> to try such a solution. So here is one base R approach that I believe works.
>> I'll break it up into 2 lines so you can see what's going on.
>>
>> ## Using your example...
>> ## First replace everything but the number with spaces
>>
>>> z <- gsub("[^[:digit:]E.+-]"," ",input)
>>> z
>> [1] " "
>> [2] " 1.3770E-03 3.4644E-07"
>> [3] " 1.9412E-04 4.8840E-08"
>> [4] ""
>> [5] " "
>> [6] " 1.3770E-03 3.4644E-07"
>> [7] " 1.9412E-04 4.8840E-08"
>>
>> ## Now it can be scanned to a numeric via
>>
>>> z<-scan(textConnection(z),what=0)
>> Read 8 items
>>> z
>> [1] 1.3770e-03 3.4644e-07 1.9412e-04 4.8840e-08 1.3770e-03 3.4644e-07
>> 1.9412e-04 4.8840e-08
>>
>> ########
>> I believe this strategy is reasonably general, but I haven't checked it
>> carefully and would appreciate folks pointing out where it trips up (e.g.
>> perhaps with NA's).
>>
>> Best,
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
>> Behalf Of baptiste auguie
>> Sent: Wednesday, November 18, 2009 3:57 AM
>> To: r-help
>> Subject: [R] parsing numeric values
>>
>> Dear list,
>>
>> I'm seeking advice to extract some numeric values from a log file
>> created by an external program. Consider the following example,
>>
>> input <-
>> readLines(textConnection(
>> "some text
>> <ax> = 1.3770E-03 <bx> = 3.4644E-07
>> <ay> = 1.9412E-04 <by> = 4.8840E-08
>>
>> other text
>> <aax> = 1.3770E-03 <bbx> = 3.4644E-07
>> <aay> = 1.9412E-04 <bby> = 4.8840E-08"))
>>
>> ## this is what I want
>> results <- c(as.numeric(strsplit(grep("<ax>", input,val=T), " ")[[1]][8]),
>> as.numeric(strsplit(grep("<ay>", input,val=T), " ")[[1]][8]),
>> as.numeric(strsplit(grep("<aax>", input,val=T), " ")[[1]][9]),
>> as.numeric(strsplit(grep("<aay>", input,val=T), " ")[[1]][9])
>> )
>>
>> ## [1] 0.00137700 0.00019412 0.00137700 0.00019412
>>
>> The use of strsplit is not ideal here as there is a different number
>> of space characters in the lines containing <ax> and <aax> for
>> instance (hence the indices 8 and 9 respectively).
>>
>> I tried to use gsubfn for a cleaner construct,
>>
>> strapply(input, "<ax> += +([0-9.]+)", c, simplify=rbind,combine=as.numeric)
>>
>> but I can't seem to find the correct regular expression to deal with
>> the exponent.
>>
>>
>> Any tips are welcome!
>>
>>
>> Best regards,
>>
>> baptiste
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list