[R] How to create a new variable based on parts of another character variable: A generalization
Petr PIKAL
petr.pikal at precheza.cz
Tue Oct 25 06:41:22 CEST 2011
Hi Bert
I am aware of factor features and frankly speaking I consider them quite
usefull despite of prevalent preference to character vectors. For the OP
question seems to me that ifelse construction is appropriate, based on his
statement he has 2 strings which shall be converted to another two strings
and that he is starting with R. I agree that for more levels to change,
factor is the way to go.
Regards
Petr
>
> ... Well, this works in this simple case, but is too clumsy for a
general
> formulation of this problem: given a "dictionary" consisting of two
> character vectors of unique "names" (or two columns in a data frame), x
> and y, how does one convert a factor z with levels in x into the
> corresponding equivalent with levels in y?
>
> There are likely a zillion ways to do this with various packages and
> functions, but the simplest and most straightforward must surely be:
factor(y[z])
>
> Example:
> > x <- LETTERS[1:4]
> > y <- LETTERS[5:8]
> > z <- factor(sample(x,15, rep=TRUE))
> > z
> [1] B D A C B A B D A D D A A D B
> Levels: A B C D
> > factor(y[z])
> [1] F H E G F E F H E H H E E H F
> Levels: E F G H
>
> This is a nice example of the utility of the factor data structure,
which
> tends to get dissed a lot, because it can badly burn you if you're not
> careful with it.
>
> A fuller discussion of these issues can be found by searching
> on"associative arrays" or "hashes", of which factors are an elementary
example.
>
> -- Bert
>
> On Mon, Oct 24, 2011 at 6:00 AM, Petr PIKAL <petr.pikal at precheza.cz>
wrote:
> Hi
>
> If you want to get rid of regular expressions at all and your A values
> start AWI for Arctic and UFT for boreal you can
>
> DF$D <- ifelse(substr(DF$A, 1,1) == "A", "Arctic", "Boreal")
>
> Regards
> Petr
>
>
>
> >
> > Hello,
> > I am just starting with R and I am having a (most probably) stupid
> problem
> > by creating a new variable in a data.frame based on a part of another
> > character variable.
> >
> > I have a data frame like this one:
> >
> >
> > A B C
> > AWI-test1 1 i
> > AWI-test5 2 r
> > AWI-tes75 56 z
> > UFT-2 5 I
> > UFT56 f t
> > UFT356 9j t
> > etc. etc. 89 t
> >
> >
> > I now want to look in the variable A if the string AWI is present and
> then
> > create a variable D and putting "Arctic" inside. However, if the
string
> > UFT occurs in the variable A, then the variable D shall be "Boreal"
etc.
> etc.
> >
> > The resulting data.frame file should look like
> > A B C D
> > AWI-test1 1 i Arctic
> > AWI-test5 2 r Arctic
> > AWI-tes75 56 z Arctic
> > UFT-2 5 I Boreal
> > UFT56 f t Boreal
> > UFT356 9j t Boreal
> > etc. etc. 89 t
> >
> >
> > I know how to do this when I want to look for the entire string of A
> means
> > when there is "AWI-test1" and then create the variable D with "Arctic"
> but
> > not how to look only for a substring in A?
> > Would be great if somebody might help.
> > Thanks
> > Philipp
> >
> >
> >
> > ***************************************************
> >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
> biostatistics/pdb-ncb-home.htm
>
More information about the R-help
mailing list