[R] Need help with text processing / string split

jim holtman jholtman at gmail.com
Sun May 15 23:33:55 CEST 2011


try this:

> x <- read.table('/temp/tbl.txt', sep = ',', header = TRUE, as.is = TRUE)
> # remove commas from the Cost column
> x$Cost <- gsub(',', '', x$Cost)
> # split the Cost
> temp <- strsplit(x$Cost, "\\$")  # "$" is special, so it is escaped
> temp <- do.call(rbind, temp)  # create a matrix
> mode(temp) <- 'numeric' # convert to numeric
> x$Cost1 <- temp[, 2]
> x$Cost2 <- temp[, 3]
> head(x)
                               Address      Township       Parcel
          Sale.Date
2                          10 PACER LN East Norriton 330006712005
Bnkrptcy-PP to6/29/2011
3                           6 BALA AVE  Lower Merion 400003292007
    STAYED5/25/2011
4             109 STONY WAY, Condo 109 East Norriton 330008575662
Bnkrptcy-PP to6/29/2011
5                   613 NORTHAMPTON RD East Norriton 330006103002
Postponed to5/25/2011
6                      67 HIGH GATE LN      Whitpain 660002716764
Pstpnd by CO to5/25/2011
7 236 Arundel Ave aka 236 Arundel Road       Horsham 360000136008
  For Sale5/25/2011
                 Costs               Cost     Cost1   Cost2
2 $173,933.60$2,410.28 $173933.60$2410.28 173933.60 2410.28
3   $264,640.36$168.00  $264640.36$168.00 264640.36  168.00
4  $70,029.04$1,483.59  $70029.04$1483.59  70029.04 1483.59
5 $254,873.19$1,772.62 $254873.19$1772.62 254873.19 1772.62
6 $404,507.59$1,947.90 $404507.59$1947.90 404507.59 1947.90
7 $252,472.27$1,034.51 $252472.27$1034.51 252472.27 1034.51
>


On Sun, May 15, 2011 at 3:50 PM, eric <ericstrom at aol.com> wrote:
> I used screen scraping to extract some information and put it into a table
> called tbl. Now I want to modify the table a bit so the data can be more
> useful. Here's the code I used:
>
> library(XML)
> rm(list=ls())
> url <-
> "http://webapp.montcopa.org/sherreal/salelist.asp?saledate=05/25/2011"
> tbl <-data.frame(readHTMLTable(url))[2:405, c(3,5,6,8,9)]
> names(tbl) <- c("Address", "Township", "Parcel", "Sale Date", "Costs")
>
> tbl is attached as txt for your convenience. Entries in the last column of
> the dataframe (tbl$Cost) appear as follows: $173,933.60$2,410.28  .
> http://r.789695.n4.nabble.com/file/n3524793/tbl.txt tbl.txt
>
> How do I:
>
> 1. Split the string
> 2. Have the two values show up as actual numbers that can be used
> 3. Put the numbers in two separate columns of the dataframe.
>
> In other words $173,933.60$2,410.28 would show up as 173933.60 in one column
> and 2410.28 would show up in a second column of tbl
>
> I tried using strsplit but I could not get it working properly.
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Need-help-with-text-processing-string-split-tp3524793p3524793.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list