[R] NA problem when use paste function

Lu, Jiang lu.jjane at gmail.com
Thu Apr 17 16:23:00 CEST 2008


Thank you very much, Dr. Ripley. The solution "ifelse()" you provided
is exactly what I want. I am so happy this morning for that I recieved
your email. Yesterday night I was trying to write a loop to substitute
NA. But now I learn that "ifelse()" does a much more efficient work.
Really appreciate your help!

Jiang

On Thu, Apr 17, 2008 at 1:38 AM, Prof Brian Ripley
<ripley at stats.ox.ac.uk> wrote:
> On Wed, 16 Apr 2008, Lu, Jiang wrote:
>
> > Dear R helpers,
> >
> > I was doing a genetic project with two datasets X and Y. There are
> > some IDs in both data sets, and others in either data set. I used
> > "merge(x,y,by="ID",all=TRUE)". The data set Y contains a variable (a
> > genotype) which is also in data X. When I merge X with Y, these two
> > variables were automatically re-named by appending .x and .y to the
> > original variable names. As you can see on the following list, I would
> > like to take whatever available (non-missing non-NA) in X or Y as the
> > final value for the genotype S3Allel1. I used paste() function.
> > However, it converts <NA> to NA as character. Would you please tell me
> > how I can just get the genotype without pasting the NA to it? I
> > checked the document of paste() and noticed that it used
> > as.character() to the vector argument. I guess that is the reason I
> > got "NA" as a string for the new variable I created (S3Allele1).
> >
>
> Please don't 'guess': that is not what as.character does.
>
> Your example is not reproducible (see the footer of this message) and it is
> not clear what the structure is.  But <NA> indicates a missing value in a
> factor or unquoted character vector.  E.g.
>
> > x <- c("G", "A", "A")
> > y <- rep(NA_character_, 3)
> > data.frame(x, y)
> >
>  x    y
> 1 G <NA>
> 2 A <NA>
> 3 A <NA>
> > paste(x, y)
> >
> [1] "G NA" "A NA" "A NA"
>
> Here y does contain missing values and paste() converted them to "NA".
> As the help says:
>
>     Note that 'paste()' coerces 'NA_character_', the character missing
>     value, to '"NA"' which may seem undesirable, e.g., when pasting
>     two character vectors, or very desirable, e.g. in 'paste("the
>     value of p is ", p)'.
>
> Possibly you want
>
> ifelse(is.na(x), y, x)
>
>
>
> >
> >
> >
> > Should I use any other funtion to avoid this problem? Any insight is
> > appreciated!
> >
> >          ID      S3Allele1.x S3Allele1.y S3Allele1
> > 1       10003           G        <NA>      G NA
> > 2       10004           A        <NA>      A NA
> > 3       10005           A        <NA>      A NA
> > 4       10006           A        <NA>      A NA
> > 5       10007           G        <NA>      G NA
> > 6       10008           A        <NA>      A NA
> > 7       10009           A        <NA>      A NA
> > 8       10010           A        <NA>      A NA
> > 9       10011           A        <NA>      A NA
> > 10      10013           A        <NA>      A NA
> > 11      10014           A        <NA>      A NA
> > 12      10015           A        <NA>      A NA
> > 13      10016           A        <NA>      A NA
> > 14      10017           A        <NA>      A NA
> > 15      10018           A        <NA>      A NA
> > 16      10019           G        <NA>      G NA
> > 17      10020           A        <NA>      A NA
> > 18      10021           G        <NA>      G NA
> > 19      10022           A        <NA>      A NA
> > 20      10023           G        <NA>      G NA
> > 21      10024           G        <NA>      G NA
> > 22      10025           G        <NA>      G NA
> > 23      10027           G        <NA>      G NA
> > 24      10028           G        <NA>      G NA
> > 25      10029           G        <NA>      G NA
> > 26      10031           G        <NA>      G NA
> > 27      10032           A        <NA>      A NA
> > 28      10033        <NA>                   NA
> > 29      10035           A        <NA>      A NA
> > 30      10037           A        <NA>      A NA
> > 31      10038        <NA>           A      NA A
> > 32      10039        <NA>           A      NA A
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>



More information about the R-help mailing list