[R] Best way to get the prices from these strings?

jim holtman jholtman at gmail.com
Wed Jan 29 16:43:01 CET 2014


Here is another approach:


> thePrices<-
+     c("id=\"p0\">$69.95</div>", "id=\"p1\">$44.95</div>",
"id=\"p2\">$69.95</div>",
+       "id=\"p3\">$59.95</div>", "id=\"p4\">$69.95</div>",
"id=\"p5\">$79.95</div>",
+       "id=\"p6\">$89.95</div>", "id=\"p7\">$59.95</div>",
"id=\"p8\">$59.95</div>",
+       "id=\"p9\">$79.95</div>", "id=\"p10\">$79.95</div>",
"id=\"p11\">$89.95</div>",
+       "id=\"p12\">$89.95</div>", "id=\"p13\">$79.95</div>",
"id=\"p14\">$89.95</div>",
+       "id=\"p15\">$79.95</div>", "id=\"p16\">$39.95</div>",
"id=\"p17\">$59.95</div>",
+       "id=\"p18\">$69.95</div>", "id=\"p19\">$83.95</div>",
"id=\"p20\">$73.95</div>",
+       "id=\"p21\">$83.95</div>", "id=\"p22\">$93.95</div>",
"id=\"p23\">$87.95</div>",
+       "id=\"p24\">$91.95</div>", "id=\"p25\">$99.95</div>",
"id=\"p26\">$61.95</div>\""
+     )
> require(gsubfn)
> as.numeric(gsubfn(".*>.([0-9.]+).*", "\\1", thePrices))
 [1] 69.95 44.95 69.95 59.95 69.95 79.95 89.95 59.95 59.95 79.95 79.95
89.95 89.95
[14] 79.95 89.95 79.95 39.95 59.95 69.95 83.95 73.95 83.95 93.95 87.95
91.95 99.95
[27] 61.95
>

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Wed, Jan 29, 2014 at 9:29 AM, Keith S Weintraub <kw1958 at gmail.com> wrote:
> Folks,
>
> I got the following prices by scraping a web page just for my own edification:
>
> thePrices<-
> c("id=\"p0\">$69.95</div>", "id=\"p1\">$44.95</div>", "id=\"p2\">$69.95</div>",
> "id=\"p3\">$59.95</div>", "id=\"p4\">$69.95</div>", "id=\"p5\">$79.95</div>",
> "id=\"p6\">$89.95</div>", "id=\"p7\">$59.95</div>", "id=\"p8\">$59.95</div>",
> "id=\"p9\">$79.95</div>", "id=\"p10\">$79.95</div>", "id=\"p11\">$89.95</div>",
> "id=\"p12\">$89.95</div>", "id=\"p13\">$79.95</div>", "id=\"p14\">$89.95</div>",
> "id=\"p15\">$79.95</div>", "id=\"p16\">$39.95</div>", "id=\"p17\">$59.95</div>",
> "id=\"p18\">$69.95</div>", "id=\"p19\">$83.95</div>", "id=\"p20\">$73.95</div>",
> "id=\"p21\">$83.95</div>", "id=\"p22\">$93.95</div>", "id=\"p23\">$87.95</div>",
> "id=\"p24\">$91.95</div>", "id=\"p25\">$99.95</div>", "id=\"p26\">$61.95</div>\""
> )
>
> Using lapply and strsplit (twice) unlist etc. I was able to get the result I wanted (the prices as numbers e.g. 59.95)  but I am sure that there is a much better way that someone might be able to point out for me.
>
> Note that I tried various regexes which didn't work.
>
> Is part of the difficulty that the strings in thePrices have multiple \"'s in them?
>
> Thanks for your time,
> Best,
> KW
>
> --
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list