[R] Differenciate numbers from reference for rows

David Winsemius dwinsemius at comcast.net
Sat Oct 30 15:43:43 CEST 2010


On Oct 30, 2010, at 8:42 AM, Gabor Grothendieck wrote:

> On Fri, Oct 29, 2010 at 6:54 PM, M.Ribeiro  
> <mresendeufv at yahoo.com.br> wrote:
>>
>> So, I am having a tricky reference file to extract information from.
>>
>> The format of the file is
>>
>> x   1 + 4 * 3 + 5 + 6 + 11 * 0.5
>>
>> So, the elements that are not being multiplied (1, 5 and 6) and the  
>> elements
>> before the multiplication sign (4 and 11) means actually the  
>> reference for
>> the row in a matrix where I need to extract the element from.
>>
>> The numbers after the multiplication sign are regular numbers
>> Ex:
>>
>>> x<-matrix(20:35)
>>
>> I would like to read the rows 1,4,5,6 and 11 and sum then. However  
>> the
>> numbers in the elements row 4 and 11 are multiplied by 3 and 0.5
>>
>> So it would be
>> 20 + 23 * 3 + 24 + 25 + 30 * 0.5.
>>
>> And I have this format in different files so I can't do all by hand.
>> Can anybody help me with a script that can differentiate this?
>
>
> I assume that every number except for the second number in the pattern
> number * number is to be replaced by that row number in x.  Try this.
> We define a regular expression which matches the first number ([0-9]+)
> of each potential pair and optionally (?) spaces ( *) a star (\\*),
> more spaces ( *) and digits [0-9.]+ passing the first and second
> backreferences (matches to the parenthesized portions of the regular
> expression) to f and inserting the output of f where the matches had
> been.
>
> library(gsubfn)
> f <- function(a, b) paste(x[as.numeric(a)], b)
> s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s)
>
> If the objective is to then perform the calculation that that
> represents then try this:
> sapply(s2, function(x) eval(parse(text = x)))
>
> For example,
>
>> s <- c("1 + 4 * 3 + 5 + 6 + 11 * 0.5", "1 + 4 * 3 + 5 + 6 + 11 *  
>> 0.5")
>> x <- matrix(20:35)
>> f <- function(a, b) paste(x[as.numeric(a)], b)
>> s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s)
>> s2
> [1] "20  + 23  * 3 + 24  + 25  + 30  * 0.5" "20  + 23  * 3 + 24  +  
> 25 + 30  * 0.5"
>> sapply(s2, function(x) eval(parse(text = x)))
> 20  + 23  * 3 + 24  + 25  + 30  * 0.5 20  + 23  * 3 + 24  + 25  +  
> 30  * 0.5
>                                   
> 153                                   153
>
> For more see the gsubfn home page at http://gsubfn.googlecode.com


I am scratching my head regarding the gsubfn workings. It appears that  
as gsubfn moves across the input strings that it will either match  
just "[0-9+]" or it will match "[0-9+] *\\* *[0-9.]+?".

In either case the match will do a lookup in x[] for the first match  
using the "a" index, and if there is a match for the second position  
assigned to "*b" then that x[a] will be followed by "*b"  and is  
therefore destined to be multiplied by "b". I cannot quite figure out  
how the NULL value gets not-matched to the second back-reference and  
then doesn't screw up the f() function by only providing one argument  
to a two argument function. Maybe it's due to this? (So can you  
comment on how optional back-references return values?)

 > paste("a", NULL)
[1] "a "

Furthermore, somehow (and this is further functiona magic I am  
missing) these results are concatenated in a string, and then  
evaluated, a step which I do get.

-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list