[R] Extracting numbers from a character variable of different types

David Winsemius dwinsemius at comcast.net
Sun Mar 18 16:57:05 CET 2012


On Mar 18, 2012, at 11:37 AM, David Winsemius wrote:

>
> On Mar 18, 2012, at 10:44 AM, irene wrote:
>
>> Hello,
>>
>> I have a file which contains a column with age, which is  
>> represented in the
>> two following patterns
>>
>> 1. "007/A" or ''007/a" or ''7 /a" ..... In this case A or a means  
>> year and I
>> would like to extract only the numeric values eg 7 in the above  
>> case if this
>> pattern exits in a line of file.
>>
>> 2. "004/M" or "004/m" where M or m means month ...... for these  
>> lines I
>> would like to first extract the numeric value of Month eg. 4  and  
>> then
>> convert it into a value of years, which would be 0.33 eg 4 divided  
>> by 12.
>
> I thought it easier to get to months as an initial step:
>
> > dfrm <- read.table(text="'007/A'\n'007/a' \n '7 /a '\n '004/ 
> M'\n'004/m'")

As I was thinking further it's easier (and clearer) to do it as years:

 > dfrm$agenew3 <- sub("[mM]", "12", dfrm$V1)
 > dfrm$agenew3 <- sub("[aA]", "1", dfrm$agenew3)
 > sapply(dfrm$agenew3, function(x) eval(parse(text=x)) )
     007/1     007/1     7 /1     004/12    004/12
7.0000000 7.0000000 7.0000000 0.3333333 0.3333333

>
> > dfrm$agenew <- sub("(^\\d+\\s*)(/)([aA])","\\1 * 12", dfrm$V1)
> > dfrm$agenew2 <- sub("(^\\d+\\s*)(/)([mM])","\\1", dfrm$agenew)
> > dfrm$agenew2
> [1] "007 * 12" "007 * 12" "7  * 12 " "004"      "004"
> > eval(parse(text=dfrm$agenew2))
> [1] 4
> > sapply(dfrm$agenew2, function(x) eval(parse(text=x)) )
> 007 * 12 007 * 12 7  * 12       004      004
>      84       84       84        4        4
-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list