[R] Converting dollar value (factors) to numeric
Greg.Snow at imail.org
Thu May 6 20:14:31 CEST 2010
This can be further simplified by combining the 2 subs into a single gsub('[$,]','',as.character(y)).
This will then convert "$123$35,24,,$1$$2,,3.4" into a number when you may have wanted something like that to give a warning and/or NA value.
The g in gsub stands for global (meaning replace every '$' and ',' not just the first one) rather than greedy (which has a different meaning in regular expressions).
This discussion brings up a related issue that I have thought about for a while. In the help for read.table in the section on colClasses it says that you can specify other conversions from character as long as there is a method for as corresponding to what you put in.
This suggests to me the approach of writing a conversion function called something like "as.dollar" then setting colClasses=c('numeric','dollar','dollar','factor') or something like that and having the middle 2 columns run through the function. However my first quick attempt failed (the doc says the method needs to be in the methods package and my quick attempt with setMethod created a local copy). There is also the possible problem that this would create a column with class dollar when I want a simple numeric.
So this brings up 2 questions:
1. has anyone found a way to create a method for as in the methods package such that my idea above would work? (preferable without much more work than the post-processing already suggested).
2. If the answer to 1 above is no, are others interested in this type of functionality and we should move the discussion to r-devel as a feature request?
Even nicer would be a simple way to go from a single character vector to multiple columns in the data frame, I remember working with a file once where the 1st 3 columns were comma separated (no spaces), but everything after that was white space separated. I read it in as whitespace separated, then had to post process the 1st column into 3. But getting all the semantics of 1 to multiple could be tricky. That particular case could also have been easier if the sep argument to read.table could be a regular expression, but that would probably slow things down for the simple cases.
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
greg.snow at imail.org
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of David Winsemius
> Sent: Thursday, May 06, 2010 4:47 AM
> To: Wang, Kevin (SYD)
> Cc: r-help at r-project.org; Phil Spector
> Subject: Re: [R] Converting dollar value (factors) to numeric
> On May 5, 2010, at 11:31 PM, Wang, Kevin (SYD) wrote:
> > Hi Phil and all those who replied,
> > Thanks heap! Yes it worked to a certain extent. However, if I have
> > the
> > following case:
> >> x <- c("$135,359.00", "$135359.00", "$1,135,359.00")
> >> y <- sub('\\$','',as.character(x))
> >> cost <- as.numeric(sub('\\,','',as.character(y)))
> Try gsub, it seems to be more "greedy" :
> cost <- as.numeric(gsub('\\,','',as.character(y)))
> > Warning message:
> > NAs introduced by coercion
> >> cost
> >  135359 135359 NA
> > Then the third value bcomes NA -- though I suspect it's probably has
> > something to do with regular expression (which I'm not sure how to
> > fix)
> > than R?
> > Thanks again for the help!
> > Cheers
> > Kev
> > -----Original Message-----
> > From: Phil Spector [mailto:spector at stat.berkeley.edu]
> > Sent: Wednesday, 5 May 2010 6:14 PM
> > To: Wang, Kevin (SYD)
> > Cc: r-help at r-project.org
> > Subject: Re: [R] Converting dollar value (factors) to numeric
> > Kev-
> > The most reliable way to do the conversion is as follows:
> >> x = factor(c('$112.11','$119.15','$121.32'))
> >> as.numeric(sub('\\$','',as.character(x)))
> >  112.11 119.15 121.32
> > This way negative quantities and numbers without dollar signs are
> > handled correctly. There's certainly no need to create a new input
> > file.
> > It may be easier to understand as
> > as.numeric(sub('$','',as.character(x),fixed=TRUE))
> > which gives the same result.
> > - Phil Spector
> > Statistical Computing Facility
> > Department of Statistics
> > UC Berkeley
> > spector at stat.berkeley.edu
> > On Wed, 5 May 2010, Wang, Kevin (SYD) wrote:
> >> Hi,
> >> I'm trying to read in a bunch of CSV files into R where many columns
> >> are coded like $111.11. When reading them in they are treated as
> > factors.
> >> I'm wondering if there is an easy way to convert them into numeric
> >> R (as I don't want to modify the source data)? I've done some
> >> searches and can't seem to find an easy way to do this.
> >> I apologise if this is a trivial question, I haven't been using R
> >> a while.
> >> Many thanks in advance!
> >> Cheers
> >> Kev
> >> Kevin Wang
> >>> Senior Advisor, Health and Human Services Practice Government
> >>> Advisory Services
> >>> KPMG
> >>> 10 Shelley Street
> >>> Sydney NSW 2000 Australia
> >>> Tel +61 2 9335 8282
> >>> Fax +61 2 9335 7001
> >> kevinwang at kpmg.com.au
> >>> Protect the environment: think before you print
> >> [[alternative HTML version deleted]]
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > and provide commented, minimal, self-contained, reproducible code.
> David Winsemius, MD
> West Hartford, CT
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help