[R] Converting dollar value (factors) to numeric
David Winsemius
dwinsemius at comcast.net
Thu May 6 20:39:28 CEST 2010
On May 6, 2010, at 2:14 PM, Greg Snow wrote:
> This can be further simplified by combining the 2 subs into a single
> gsub('[$,]','',as.character(y)).
>
> This will then convert "$123$35,24,,$1$$2,,3.4" into a number when
> you may have wanted something like that to give a warning and/or NA
> value.
>
> The g in gsub stands for global (meaning replace every '$' and ','
> not just the first one) rather than greedy (which has a different
> meaning in regular expressions).
>
> This discussion brings up a related issue that I have thought about
> for a while. In the help for read.table in the section on
> colClasses it says that you can specify other conversions from
> character as long as there is a method for as corresponding to what
> you put in.
>
> This suggests to me the approach of writing a conversion function
> called something like "as.dollar" then setting
> colClasses=c('numeric','dollar','dollar','factor') or something like
> that and having the middle 2 columns run through the function.
> However my first quick attempt failed (the doc says the method needs
> to be in the methods package and my quick attempt with setMethod
> created a local copy). There is also the possible problem that this
> would create a column with class dollar when I want a simple numeric.
>
> So this brings up 2 questions:
>
> 1. has anyone found a way to create a method for as in the methods
> package such that my idea above would work? (preferable without much
> more work than the post-processing already suggested).
I do get a warning but it does seem to "work" as intended. Basically
following as best I could suggestion a couple of months ago by Gabor
Grothendieck. A link to an early post and then a colClass method to
strip "$" and ","'s:
http://finzi.psych.upenn.edu/Rhelp10/2010-February/229550.html
> Input <- "$245,000,000\n 3,000.000\n $$$34"
> setAs("character", "num.with.commas.dolsign",
+ function(from) as.numeric(gsub(",|\\$", "", from)))
Warning message:
In matchSignature(signature, fdef, where) :
in the method signature for function "coerce" no definition for
class: “num.with.commas.dolsign”
> DF <- read.table(textConnection(Input), header = FALSE,
+ colClasses = c("num.with.commas.dolsign"))
> DF
V1
1 2.45e+08
2 3.00e+03
3 3.40e+01
> sprintf("%12.2f", DF$V1)
[1] "245000000.00" " 3000.00" " 34.00"
Any help with cleaning up the S4 incantations would be welcome.
--
David.
> 2. If the answer to 1 above is no, are others interested in this
> type of functionality and we should move the discussion to r-devel
> as a feature request?
>
> Even nicer would be a simple way to go from a single character
> vector to multiple columns in the data frame, I remember working
> with a file once where the 1st 3 columns were comma separated (no
> spaces), but everything after that was white space separated. I
> read it in as whitespace separated, then had to post process the 1st
> column into 3. But getting all the semantics of 1 to multiple could
> be tricky. That particular case could also have been easier if the
> sep argument to read.table could be a regular expression, but that
> would probably slow things down for the simple cases.
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of David Winsemius
>> Sent: Thursday, May 06, 2010 4:47 AM
>> To: Wang, Kevin (SYD)
>> Cc: r-help at r-project.org; Phil Spector
>> Subject: Re: [R] Converting dollar value (factors) to numeric
>>
>>
>> On May 5, 2010, at 11:31 PM, Wang, Kevin (SYD) wrote:
>>
>>> Hi Phil and all those who replied,
>>>
>>> Thanks heap! Yes it worked to a certain extent. However, if I have
>>> the
>>> following case:
>>>> x <- c("$135,359.00", "$135359.00", "$1,135,359.00")
>>>> y <- sub('\\$','',as.character(x))
>>>> cost <- as.numeric(sub('\\,','',as.character(y)))
>>
>> Try gsub, it seems to be more "greedy" :
>>
>> cost <- as.numeric(gsub('\\,','',as.character(y)))
>>
>> --
>> David
>>> Warning message:
>>> NAs introduced by coercion
>>>> cost
>>> [1] 135359 135359 NA
>>>
>>> Then the third value bcomes NA -- though I suspect it's probably has
>>> something to do with regular expression (which I'm not sure how to
>>> fix)
>>> than R?
>>>
>>> Thanks again for the help!
>>>
>>> Cheers
>>> Kev
>>>
>>> -----Original Message-----
>>> From: Phil Spector [mailto:spector at stat.berkeley.edu]
>>> Sent: Wednesday, 5 May 2010 6:14 PM
>>> To: Wang, Kevin (SYD)
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] Converting dollar value (factors) to numeric
>>>
>>> Kev-
>>> The most reliable way to do the conversion is as follows:
>>>
>>>> x = factor(c('$112.11','$119.15','$121.32'))
>>>> as.numeric(sub('\\$','',as.character(x)))
>>> [1] 112.11 119.15 121.32
>>>
>>> This way negative quantities and numbers without dollar signs are
>>> handled correctly. There's certainly no need to create a new input
>>> file.
>>>
>>> It may be easier to understand as
>>>
>>> as.numeric(sub('$','',as.character(x),fixed=TRUE))
>>>
>>> which gives the same result.
>>> - Phil Spector
>>> Statistical Computing Facility
>>> Department of Statistics
>>> UC Berkeley
>>> spector at stat.berkeley.edu
>>>
>>>
>>> On Wed, 5 May 2010, Wang, Kevin (SYD) wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm trying to read in a bunch of CSV files into R where many
>>>> columns
>>>> are coded like $111.11. When reading them in they are treated as
>>> factors.
>>>>
>>>> I'm wondering if there is an easy way to convert them into numeric
>> in
>>>> R (as I don't want to modify the source data)? I've done some
>>>> searches and can't seem to find an easy way to do this.
>>>>
>>>> I apologise if this is a trivial question, I haven't been using R
>> for
>>>> a while.
>>>>
>>>> Many thanks in advance!
>>>>
>>>> Cheers
>>>>
>>>> Kev
>>>>
>>>> Kevin Wang
>>>>> Senior Advisor, Health and Human Services Practice Government
>>>>> Advisory Services
>>>>>
>>>>> KPMG
>>>>> 10 Shelley Street
>>>>> Sydney NSW 2000 Australia
>>>>>
>>>>> Tel +61 2 9335 8282
>>>>> Fax +61 2 9335 7001
>>>>>
>>>> kevinwang at kpmg.com.au
>>>>
>>>>> Protect the environment: think before you print
>>>>>
>>>>>
>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list