[R] paste with apply, spaces and NA

William Dunlap wdunlap at tibco.com
Fri May 8 00:36:33 CEST 2009


> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Sarah Goslee
> Sent: Thursday, May 07, 2009 3:00 PM
> To: r-help
> Subject: [R] paste with apply, spaces and NA
> 
> Hello everyone,
> 
> I've come up with a problem with using paste() inside apply() that I
> can't seem to solve.
> Briefly, if I'm using paste to collapse the rows of a data frame, AND
> the data frame
> contains strings with spaces, AND there are NA values in subsequent
> columns, then
> paste() introduces spaces. This only happens with that 
> particular combination of
> data values and commands. I have a workaround - replacing NA 
> with "NA" - but
> this seems odd.
> 
> Thanks for any thoughts,
> Sarah

Do you get similar results rif you use 10 instead of NA
in your examples, with more spaces if you use 10000?

I think this has to do with apply's call to as.matrix(X)
when X is a data.frame with mixed numeric and character
or factor columns.  It calls format() on each numeric column
to convert its elements to strings with the same number
of characters in each string.  apply() rarely gives you what
you want on such mixed data.frames.

Pasting the columns without apply is faster and will
give the correct results.  I find it convenient to use
do.call here:
    > do.call(`paste`, c(unname(test3),list(sep=",")))
    [1] "1,a,a b,2"  "1,a,a b,2"  "1,a,a b,2"  "1,a,a b,NA" "1,a,a b,2"
(unname(as.list(test3)) would be a bit more legal.  The unname
would be required if one of the column names was 'sep' or
'collapse'.)

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com 

> 
> 
> R --vanilla
> # R version 2.9.0 (2009-04-17)
> # Fedora Core 10
> 
> > test1 <- data.frame(A = rep(1, 5), B = rep("a", 5), C = 
> rep("a b", 5), D = rep(2, 5), stringsAsFactors=FALSE)
> >
> > # has an NA value in a column before the column containing 
> strings with spaces
> > test2 <- test1
> > test2$B[4] <- NA
> >
> > # has an NA value in a column after the column containing 
> strings with spaces
> > test3 <- test1
> > test3$D[4] <- NA
> 
> > str(test1)
> 'data.frame':	5 obs. of  4 variables:
>  $ A: num  1 1 1 1 1
>  $ B: chr  "a" "a" "a" "a" ...
>  $ C: chr  "a b" "a b" "a b" "a b" ...
>  $ D: num  2 2 2 2 2
> > str(test2)
> 'data.frame':	5 obs. of  4 variables:
>  $ A: num  1 1 1 1 1
>  $ B: chr  "a" "a" "a" NA ...
>  $ C: chr  "a b" "a b" "a b" "a b" ...
>  $ D: num  2 2 2 2 2
> > str(test3)
> 'data.frame':	5 obs. of  4 variables:
>  $ A: num  1 1 1 1 1
>  $ B: chr  "a" "a" "a" "a" ...
>  $ C: chr  "a b" "a b" "a b" "a b" ...
>  $ D: num  2 2 2 NA 2
> 
> > # works as expected
> > apply(test1, 1, paste, collapse=",")
> [1] "1,a,a b,2" "1,a,a b,2" "1,a,a b,2" "1,a,a b,2" "1,a,a b,2"
> 
> > # works as expected
> > # does NOT add spaces to the column with the NA value
> > apply(test2, 1, paste, collapse=",")
> [1] "1,a,a b,2"  "1,a,a b,2"  "1,a,a b,2"  "1,NA,a b,2" "1,a,a b,2"
> 
> > # introduces spaces in the column with the NA value
> > # only if that column is after a column that contains 
> strings with spaces
> > apply(test3, 1, paste, collapse=",")
> [1] "1,a,a b, 2" "1,a,a b, 2" "1,a,a b, 2" "1,a,a b,NA" "1,a,a b, 2"
> 
> > # pasting the columns together manually works as expected
> > paste(test3$A, test3$B, test3$C, test3$D, sep=",")
> [1] "1,a,a b,2"  "1,a,a b,2"  "1,a,a b,2"  "1,a,a b,NA" "1,a,a b,2"
> 
> > # pasting a single row works as expected
> > paste(test3[3,], collapse=",")
> [1] "1,a,a b,2"
> 
> ## workaround
> > test3[is.na(test3)] <- "NA"
> > apply(test3, 1, paste, sep="", collapse=",")
> [1] "1,a,a b,2"  "1,a,a b,2"  "1,a,a b,2"  "1,a,a b,NA" "1,a,a b,2"
> 
> 
> 
> -- 
> Sarah Goslee
> http://www.functionaldiversity.org
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list