[R] character to numeric conversion

Peter Dalgaard P.Dalgaard at biostat.ku.dk
Mon Mar 19 10:49:32 CET 2007


Robin Hankin wrote:
> Hi.
>
> Is there a straightforward way to convert a character string  
> containing comma-delimited
> numbers  to a numeric vector?
>
> In my application, I use
>
> system(executable.string, intern=TRUE)
>
> which returns a string like
>
> "[0.E-38, 2.096751179214927596171268230,  
> 3.678944959657480671183123052, 4.976528845643001020345216157,  
> 6.072390165503099343887569007, 7.007958550337542210168866070,  
> 7.807464185827177139302778736, 8.486139455817034846608029724,  
> 9.053706780665060873259065771, 9.516172308326877463284426111,  
> 9.876856047379733199590985269, 10.13695826383869052536062804,  
> 10.29580989588667234885515374, 10.35092785255025551187463209,  
> 10.29795676261278695909972578, 10.13052574735986793562227138,  
> 9.839990935943625006580521345, 9.414977153151389385186358494,  
> 8.840562526759586215404890348, 8.096830792651667245232639586,  
> 7.156244887881612948153311800, 5.978569259122249264778017262,  
> 4.499809670330265066808481929, 2.602689685444383764768503589, 0.E-38]"
>
>
> (the output is a single line).   In a big run, the string may contain  
> 10^5 or possibly 10^6 numbers.
>
> What's the recommended way to convert this to a numeric vector?
>
>   
scan() on a text connection:

> x <- "[0.E-38, 2.096751179214927596171268230,
+ 3.678944959657480671183123052, 4.976528845643001020345216157,
+ 6.072390165503099343887569007, 7.007958550337542210168866070,
+ 7.807464185827177139302778736, 8.486139455817034846608029724,
+ 9.053706780665060873259065771, 9.516172308326877463284426111,
+ 9.876856047379733199590985269, 10.13695826383869052536062804,
+ 10.29580989588667234885515374, 10.35092785255025551187463209,
+ 10.29795676261278695909972578, 10.13052574735986793562227138,
+ 9.839990935943625006580521345, 9.414977153151389385186358494,
+ 8.840562526759586215404890348, 8.096830792651667245232639586,
+ 7.156244887881612948153311800, 5.978569259122249264778017262,
+ 4.499809670330265066808481929, 2.602689685444383764768503589, 0.E-38]"
> tc <- textConnection(gsub("[][ \n]","",x))
> xx <- scan(tc,sep=",")
Read 25 items
> summary(xx)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  0.000   4.977   8.097   7.049   9.840  10.350
> close(tc)

(By far, the hardest bit was getting the gsub regexp right...)

Alternatively, just get rid of the brackets and replace commas with
whitespace. A problem with sep="," is that it gets confused by line
endings following a comma.

> tc <- textConnection(gsub(",", " ", gsub("[][]", "", x)))
> xx <- scan(tc)
Read 25 items
> summary(xx)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  0.000   4.977   8.097   7.049   9.840  10.350
> close(tc)



>
>
> --
> Robin Hankin
> Uncertainty Analyst
> National Oceanography Centre, Southampton
> European Way, Southampton SO14 3ZH, UK
>   tel  023-8059-7743
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-help mailing list