[R] removing specified length of text after a period in dataframe of char's

Aidan Corcoran aidan.corcoran11 at gmail.com
Wed Dec 7 14:20:10 CET 2011


Hi Sarah,

apologies for the excess. A smaller example:

f<-structure(list(c("GDP per capita (LCU)", "Ratio to EZ GDP Per Cap"
), `2005` = c(32128, 0.1), `2009` = c(52163, 0.1), `2010` = c(63100,
0.1), `2011` = c(72461, 0.1), `2012` = c(81313, 0.1)), .Names = c("",
"2005", "2009", "2010", "2011", "2012"), row.names = 3:4, class = c("cast_df",
"data.frame"))

nam2<-
structure(list(var1 = c("GDP per capita (LCU)", "Ratio to EZ GDP Per Cap"
), digi = c(0, 1)), .Names = c("var1", "digi"), row.names = c("98",
"110"), class = "data.frame")

I'm trying to place a thousand separator in the numbers in the table f:

> f
                             2005    2009    2010    2011    2012
3    GDP per capita (LCU) 32128.0 52163.0 63100.0 72461.0 81313.0
4 Ratio to EZ GDP Per Cap     0.1     0.1     0.1     0.1     0.1

and also have precision given by variable digi:

> nam2
                       var1 digi
98     GDP per capita (LCU)    0
110 Ratio to EZ GDP Per Cap    1

format
  hi<-format(f,big.mark=",",scientific=F)
gives me the comma, but now I'm not sure how to get the precision.

Your answer seems to be doing what I want, although when I changed the
testdata slightly
>testdata[1,1]<-10000
>   hi<-format(testdata,big.mark=",",scientific=F)
> hi
    values digits
1 10,000.0      0
2      5.3      1
3      1.1      2
> apply(hi, 1, function(x)sub(paste("(^.*\\.\\d{", x[2], "})(\\d*)", sep=""), "\\1", x[1]))
         1          2          3
 "10,000." "     5.3" "     1.1"
The decimal appears to be left behind in 10,000.

Unfortunately your approach is a bit too advanced for me, so I can't
adapt it. Perhaps you could recommend somewhere where I could read up
on what the caret and other symbols mean in your paste call?

thanks for your help!

Aidan

On Wed, Dec 7, 2011 at 12:05 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
> Hi,
>
> Example data is crucial, but small simple example data is even better.
> I'm too lazy to figure out which bits I need from your data, so here's
> a simple example of one way to approach your question. You could
> use gsub() in very much the same manner if you need more complex
> output.
>
>> testdata <- data.frame(values=c(2.0, 5.3, 1.1), digits=c(0, 1, 2))
>> testdata
>  values digits
> 1    2.0      0
> 2    5.3      1
> 3    1.1      2
> # a nice way that works on numbers
>> apply(testdata, 1, function(x)sprintf(paste("%0.", x[2], "f", sep=""), x[1]))
> [1] "2"    "5.3"  "1.10"
>
> # a messy way that works on strings
>> apply(testdata, 1, function(x)sub(paste("(^.*\\.\\d{", x[2], "})(\\d*)", sep=""), "\\1", x[1]))
> [1] "2"   "5.3" "1.1"
>
> Also note that the second method will not add zeros to pad out the
> end. If you need that, I'd consider rearranging the order of your
> steps so that you can use sprintf().
>
> Someone else might have a more flexible way too; I'd be interested to see it.
> Unfortunately I don't think sprintf() has a way to insert a thousands separator,
> or that would be a one-step solution.
>
> Sarah
>
> On Wed, Dec 7, 2011 at 6:05 AM, Aidan Corcoran
> <aidan.corcoran11 at gmail.com> wrote:
>>  Dear all,
>>
>>  I'm trying to remove some text after the period (a decimal point) in
>> the data frame 'hi', below. This is one step in formatting a table. So
>> I would like e.g.
>> "2.0" to become "2"
>> and "5.3" to be "5.3",
>> where the variable digordered contains the number of digits after the
>> decimal that I would like to display, in the same order in which the
>> variables appear in hi. If it makes it easier to use, this info is
>> also contained in the dataframe nam2. The reason the numbers are
>> recorded as characters is because I used format to get a thousand
>> separator, which I also need.
>>
>> The string manipulation functions in R generally don't seem to work
>> with matrices or data frames, so e.g.   regexpr("\\.",  hi[1,2]) works
>> but not regexpr("\\.", hi). Finding the location of the period and
>> then using substring was the approach I was thinking of taking, but
>> this would seem to need for loops here. I was wondering if anyone
>> knows any easier ways.
>>
>> Thanks very much for any help!
>>
>> Aidan
>>
>>
>> digordered<-  c(0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1)
>> f<-structure(list(c("GDP (LCU,bn)", "GDP ($, bn)", "GDP per capita (LCU)",
>> "Ratio to EZ GDP Per Cap", "Share of World GDP (Intl $, %)",
>> "Real GDP Growth (%)", "Population (mn)", "Unemployment Rate (%)",
>> "Ratio of Employed/Unemployed", "PPP Exchange Rate", "Nominal Exchange
>> Rate (LCU per $)",
>> "Inflation (%)", "Main Lending Rate to Private Sector (%)", "Claims on
>> Central Gov",
>> "Claims on Private Sector", "Bank Assets", "Regulator Capital to RWA",
>> "Tier 1 Capital to RWA", "Return on Equity", "Liquid Assets to ST Liabilities"
>> ), `2005` = c(35662, 809, 32128, 0.1, 4.3, 9, 1110, 3.5, NA,
>> 14.7, 44.1, 4, 10.8, 7, 15, 22835, NA, NA, NA, NA), `2009` = c(61240,
>> 1265, 52163, 0.1, 5.2, 6.8, 1174, NA, NA, 16.8, 48.4, 10.9, 12.2,
>> 14, 31, 47180, 13.6, 9, 10.8, 42.8), `2010` = c(75122, 1632,
>> 63100, 0.1, 5.5, 10.1, 1191, NA, NA, 18.5, 45.7, 12, NA, 15,
>> 39, 56787, 14.7, 9.9, 10.5, 41.1), `2011` = c(87455, 1843, 72461,
>> 0.1, 5.7, 7.8, 1207, NA, NA, 19.6, NA, 10.6, NA, NA, NA, NA,
>> 13.5, 9.3, 14.3, 35.8), `2012` = c(99459, 2013, 81313, 0.1, 5.9,
>> 7.5, 1223, NA, NA, 20.5, NA, 8.6, NA, NA, NA, NA, NA, NA, NA,
>> NA)), .Names = c("", "2005", "2009", "2010", "2011", "2012"), row.names = c(NA,
>> 20L), class = c("cast_df", "data.frame"))
>>
>>  hi<-format(f,big.mark=",",scientific=F)
>>  regexpr("\\.",  hi) #don't know to get location of "." in a dataframe of chars
>>
>>
>> nam2<-  structure(list(var1 = c("GDP (LCU,bn)", "GDP ($, bn)", "GDP
>> per capita (LCU)",
>> "Ratio to EZ GDP Per Cap", "GDP per capita (Intl $)", "EU GDP per
>> capita (Intl $)",
>> "Share of World GDP (Intl $, %)", "Real GDP Growth (%)", "Population (mn)",
>> "Unemployment Rate (%)", "Ratio of Employed/Unemployed", "Employment (1000s)",
>> "Unemployment (1000s)", "PPP Exchange Rate", "Nominal Exchange Rate
>> (LCU per $)",
>> "Inflation (%)", "Main Lending Rate to Private Sector (%)", "Claims on
>> Central Gov",
>> "Claims on Private Sector", "Bank Assets", "Regulator Capital to RWA",
>> "Tier 1 Capital to RWA", "Return on Equity", "Liquid Assets to ST Liabilities",
>> "Reserves"), digi = c(0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0,
>> 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0)), .Names = c("var1", "digi"
>> ), row.names = c("96", "97", "98", "110", "99", "100", "101",
>> "102", "103", "111", "112", "104", "105", "106", "107", "108",
>> "109", "114", "115", "113", "119", "120", "121", "122", "116"
>> ), class = "data.frame")
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Sarah Goslee
> http://www.stringpage.com
> http://www.sarahgoslee.com
> http://www.functionaldiversity.org



More information about the R-help mailing list