[R] substituting dots in the names of the columns (sub, gsub, regexpr)
Gabor Grothendieck
ggrothendieck at gmail.com
Thu Jul 26 16:07:12 CEST 2007
Use \\. or [.] with quotes to denote a literal dot (#1)
or can use fixed = TRUE to remove the meaning of dot (#2) or
use a zero-width lookahead assertion (?=[.]) which will be matched
but is not added to the string to be replaced (#3). Try ?regexpr .
Also the links on the gsubfn home page (http://code.google.com/p/gsubfn/)
point to a number of good resources on regular expressions.
Str <- c("y..m.", "BD..g.cm3.", "PR..Mpa.", "Ks..m.s.", "SP.g..g.",
"P..m3.m3.", "theta1..g.g.", "theta2..g.g.", "AWC..g.g.")
# 1
tmp <- gsub("[.]+", ".", Str)
sub("[.]+$", "", tmp)
# 2
tmp <- gsub("..", ".", Str, fixed = TRUE)
sub("[.]+$", "", tmp)
# 3 - both done at once using zero-width lookahead
gsub("[.]*$|[.]*(?=[.])", "", Str, perl = TRUE)
On 7/26/07, 8rino-Luca Pantani <ottorino-luca.pantani at unifi.it> wrote:
> Dear R users,
> I have the following two problems, related to the function sub, grep,
> regexpr and similia.
>
> The header of the file(s) I have to import is like this.
>
> c("y (m)", "BD (g/cm3)", "PR (Mpa)", "Ks (m/s)", "SP g./g.", "P
> (m3/m3)", "theta1 (g/g)", "theta2 (g/g)", "AWC (g/g)")
>
> To get rid of spaces and symbols in the names of the columns,
> I use read.table(... check.names=TRUE) and I get:
> str <- c("y..m.", "BD..g.cm3.", "PR..Mpa.", "Ks..m.s.", "SP.g..g.",
> "P..m3.m3.", "theta1..g.g.", "theta2..g.g.", "AWC..g.g.")
>
> Now, my problem is to remove the trailing dots, as well as the double
> dots, in order to get the names like the following
> c("y.m", "BD.g.cm3", "PR.Mpa", "Ks.m.s", "SP.g.g", "P.m3.m3.",
> "theta1.g.g", "theta2.g.g", "AWC.g.g")
>
> I've searched the help pages for sub, regexpr and similia, and also
> searched the help archives.
> I understand that the dot is a peculiar sign since
> sub("..", ".", str)
> [1] "..m." "...g.cm3." "...Mpa." "...m.s." "..g..g."
> [6] "..m3.m3." ".eta1..g.g." ".eta2..g.g." ".C..g.g."
>
> Therefore I tried
> sub("\\..", ".", str)
> [1] "y.m." "BD.g.cm3." "PR.Mpa." "Ks.m.s." "SP...g."
> [6] "P.m3.m3." "theta1.g.g." "theta2.g.g." "AWC.g.g."
> and I've been surprised by the (to me) strange behaviour in "SP.g..g."
> modified in "SP...g."
> An this is the first problem I cannot solve.
>
> Then there's the problem of trailing dot removal.
> In
> http://tolstoy.newcastle.edu.au/R/e2/help/07/01/8665.html
> I've found a somewhat similar problem, but it do not works in this case
> since:
> gsub("[.].*", "", str)
> [1] "y" "BD" "PR" "Ks" "SP" "P" "theta1" "theta2"
> [9] "AWC"
> And this the second problem
>
> Apart this particular problems I would like to know more on regexp, sub
> and so on, since each time
> I have strings to manipulate, I must face my ignorance in the topic of
> regular expression and its syntax.
>
> Is there any page with examples, where I can improve my knowledge and
> stop being frustrated each time I have to manipulate strings?
>
> 8rino
>
> --
> Ottorino-Luca Pantani, Università di Firenze
> Dip. Scienza del Suolo e Nutrizione della Pianta
> P.zle Cascine 28 50144 Firenze Italia
> Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273
> OLPantani at unifi.it
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list