[R] string parsing

Wed Feb 16 16:28:21 CET 2011

It's only "awfully" inefficient if it's a bottleneck.  You're not doing this more than once per item fetched from the network, and the time is insignificant relative to the fetch.  If it were somehow in your inner loop, it would be worth worrying about, but your purpose is to eliminate Ms and Bs so that you'll never ever see them again. If performance is a problem, look at your inner loop, not here.

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Mike Marchywka
Sent: Tuesday, February 15, 2011 9:01 PM
To: sds at gnu.org; r-help at stat.math.ethz.ch
Subject: Re: [R] string parsing

----------------------------------------
> To: r-help at stat.math.ethz.ch
> From: sds at gnu.org
> Date: Tue, 15 Feb 2011 17:20:11 -0500
> Subject: [R] string parsing
>
> I am trying to get stock metadata from Yahoo finance (or maybe there is
> a better source?)

search this for "yahoo",

http://cran.r-project.org/web/packages/quantmod/quantmod.pdf

as a perennial page scraper, I was amazed this existed :)

> here is what I did so far:
>
> yahoo.url <- "http://finance.yahoo.com/d/quotes.csv?f=j1jka2&s=";
> stocks <- c("IBM","NOIZ","MSFT","LNN","C","BODY","F"); # just some samples
> socket <- url(paste(yahoo.url,sep="",paste(stocks,collapse="+")),open="r");
> data <- read.csv(socket, header = FALSE);
> close(socket);
> data is now:
> V1 V2 V3 V4
> 1 200.5B 116.00 166.25 4965150
> 2 19.1M 3.75 5.47 8521
> 3 226.6B 22.73 31.58 57127000
> 4 886.4M 30.80 74.54 226690
> 5 142.4B 3.21 5.15 541804992
> 6 276.4M 11.98 21.30 149656
> 7 55.823B 9.75 18.97 89369000
>
> now I need to do this:
>
> --> convert 55.823B to 55e9 and 19.1M to 19e6
>
> parse.num <- function (s) { as.numeric(gsub("M$","e6",gsub("B$","e9",s))); }
> data[1]<-lapply(data[1],parse.num);
>
> seems like awfully inefficient (two regexp substitutions),
> is there a better way?
>
> --> iterate over stocks & data at the same time and put the results into
> a hash table:
> for (i in 1:length(stocks)) cache[[stocks[i]]] <- data[i,];
>
> I do get the right results,
> but I am wondering if I am doing it "the right R way".
> E.g., the hash table value is a data frame.
> A structure(record?) seems more appropriate.
>
> thanks!
>
> --
> Sam Steingold (http://sds.podval.org/) on CentOS release 5.3 (Final)
> http://pmw.org.il http://ffii.org http://camera.org http://honestreporting.com
> http://iris.org.il http://mideasttruth.com http://thereligionofpeace.com
> I haven't lost my mind -- it's backed up on tape somewhere.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited.