[R] How to create a new variable based on parts of another character variable: A generalization

Tue Oct 25 06:41:22 CEST 2011

Hi Bert

I am aware of factor features and frankly speaking I consider them quite 
usefull despite of prevalent preference to character vectors. For the OP 
question seems to me that ifelse construction is appropriate, based on his 
statement he has 2 strings which shall be converted to another two strings 
and that he is starting with R. I agree that for more levels to change, 
factor is the way to go.

Regards
Petr

> 
> ... Well, this works in this simple case, but is too clumsy for a 
general 
> formulation of this problem:  given a "dictionary" consisting of two 
> character vectors of unique "names" (or two columns in a data frame), x 
> and y,  how does one convert a factor z with levels in x into the 
> corresponding equivalent with levels in y?
> 
> There are likely a zillion ways to do this with various packages and 
> functions, but the simplest and most straightforward must surely be:  
factor(y[z])  
> 
> Example:
> > x <- LETTERS[1:4]
> > y <- LETTERS[5:8]
> > z <- factor(sample(x,15, rep=TRUE))
> > z
>  [1] B D A C B A B D A D D A A D B
> Levels: A B C D
> > factor(y[z])
>  [1] F H E G F E F H E H H E E H F
> Levels: E F G H
> 
> This is a nice example of the utility of the factor data structure, 
which 
> tends to get dissed a lot, because it can badly burn you if you're not 
> careful with it.
> 
> A fuller discussion of these issues can be found by searching 
> on"associative arrays"  or "hashes", of which factors are an elementary 
example.
> 
> -- Bert
> 

> On Mon, Oct 24, 2011 at 6:00 AM, Petr PIKAL <petr.pikal at precheza.cz> 
wrote:
> Hi
> 
> If you want to get rid of regular expressions at all and your A values
> start AWI for Arctic and UFT for boreal you can
> 
> DF$D <- ifelse(substr(DF$A, 1,1) == "A", "Arctic", "Boreal")
> 
> Regards
> Petr
> 
> 
> 
> >
> > Hello,
> > I am just starting with R and I am having a (most probably) stupid
> problem
> > by creating a new variable in a data.frame based on a part of another
> > character variable.
> >
> > I have a data frame like this one:
> >
> >
> > A         B       C
> > AWI-test1   1      i
> > AWI-test5   2      r
> > AWI-tes75   56      z
> > UFT-2      5      I
> > UFT56      f      t
> > UFT356      9j      t
> > etc. etc.      89      t
> >
> >
> > I now want to look in the variable A if the string AWI is present and
> then
> > create a variable D and putting "Arctic" inside. However, if the 
string
> > UFT occurs in the variable A, then the variable D shall be "Boreal" 
etc.
> etc.
> >
> > The resulting data.frame file should look like
> > A         B       C   D
> > AWI-test1   1      i   Arctic
> > AWI-test5   2      r   Arctic
> > AWI-tes75   56      z   Arctic
> > UFT-2      5      I   Boreal
> > UFT56      f      t   Boreal
> > UFT356      9j      t   Boreal
> > etc. etc.      89      t
> >
> >
> > I know how to do this when I want to look for the entire string of A
> means
> > when there is "AWI-test1" and then create the variable D with "Arctic"
> but
> > not how to look only for a substring in A?
> > Would be great if somebody might help.
> > Thanks
> > Philipp
> >
> >
> >
> > ***************************************************
> >
> >
> >    [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
> biostatistics/pdb-ncb-home.htm
>