[R] string parsing

Wed Feb 16 21:25:48 CET 2011

On Tue, Feb 15, 2011 at 5:20 PM, Sam Steingold <sds at gnu.org> wrote:
> I am trying to get stock metadata from Yahoo finance (or maybe there is
> a better source?)
> here is what I did so far:
>
> yahoo.url <- "http://finance.yahoo.com/d/quotes.csv?f=j1jka2&s=";
> stocks <- c("IBM","NOIZ","MSFT","LNN","C","BODY","F"); # just some samples
> socket <- url(paste(yahoo.url,sep="",paste(stocks,collapse="+")),open="r");
> data <- read.csv(socket, header = FALSE);
> close(socket);
> data is now:
>       V1     V2     V3        V4
> 1  200.5B 116.00 166.25   4965150
> 2   19.1M   3.75   5.47      8521
> 3  226.6B  22.73  31.58  57127000
> 4  886.4M  30.80  74.54    226690
> 5  142.4B   3.21   5.15 541804992
> 6  276.4M  11.98  21.30    149656
> 7 55.823B   9.75  18.97  89369000
>
> now I need to do this:
>
> --> convert 55.823B to 55e9 and 19.1M to 19e6
>
> parse.num <- function (s) { as.numeric(gsub("M$","e6",gsub("B$","e9",s))); }
> data[1]<-lapply(data[1],parse.num);
>
> seems like awfully inefficient (two regexp substitutions),
> is there a better way?
>
> --> iterate over stocks & data at the same time and put the results into
> a hash table:
> for (i in 1:length(stocks)) cache[[stocks[i]]] <- data[i,];
>
> I do get the right results,
> but I am wondering if I am doing it "the right R way".
> E.g., the hash table value is a data frame.
> A structure(record?) seems more appropriate.
>

Check the example at the end of section 2 of the gsubfn vignette:

http://cran.r-project.org/web/packages/gsubfn/vignettes/gsubfn.pdf

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com