[Rd] grep, gsub, sub have problems with NA values (PR#3078)
Warnes, Gregory R
gregory_r_warnes at groton.pfizer.com
Sat May 24 09:35:49 MEST 2003
Oh dear, more careful checking shows that all elements of a factor get
converted to NA by formatC, but the results retain the factor levels:
> x <- factor(letters[1:5], width=8)
> formatC(x)
[1] <NA> <NA> <NA> <NA> <NA>
Levels: a b c d e
I have a hard time justifying this behavior. I expected it to act like
format.char:
> format.char(x,width=8)
[1] "a " "b " "c " "d " "e "
Warning message:
format.char: coercing 'x' to 'character' in: format.char(x, width = 8)
The way this came up was in formatting all of the elements of a dataframe to
have width 8 so that I could create a fixed width output file...
-G
> -----Original Message-----
> From: Warnes, Gregory R
> Sent: Saturday, May 24, 2003 8:17 AM
> To: Warnes, Gregory R; 'Thomas Lumley'
> Cc: 'r-devel at stat.math.ethz.ch'
> Subject: RE: [Rd] grep, gsub, sub have problems with NA
> values (PR#3078)
>
>
>
> I see that this came out garbled. It should have read:
>
> FormatC also has problems: It incorrectly convertys any
> factor level *containing* the characters 'NA' to a missing value.
>
> > > formatC(factor("NAME"),width=8)
> > [1] <NA>
> > Levels: NAME
>
> -G
>
>
> > -----Original Message-----
> > From: Warnes, Gregory R
> > Sent: Friday, May 23, 2003 5:22 PM
> > To: 'Thomas Lumley'; Warnes, Gregory R
> > Cc: r-devel at stat.math.ethz.ch
> > Subject: RE: [Rd] grep, gsub, sub have problems with NA
> > values (PR#3078)
> >
> >
> >
> > FormatC also has the reverse problem, it detects any factor
> > contianing the string "NA" and converts it to a factor:
> >
> > > formatC(factor("NAME"),width=8)
> > [1] <NA>
> > Levels: NAME
> >
> > -Greg
> >
> >
> >
> > > -----Original Message-----
> > > From: Thomas Lumley [mailto:tlumley at u.washington.edu]
> > > Sent: Friday, May 23, 2003 11:47 AM
> > > To: gregory_r_warnes at groton.pfizer.com
> > > Cc: r-devel at stat.math.ethz.ch
> > > Subject: Re: [Rd] grep, gsub, sub have problems with NA
> > > values (PR#3078)
> > >
> > >
> > > On Thu, 22 May 2003 gregory_r_warnes at groton.pfizer.com wrote:
> > >
> > > >
> > > > In a string context, grep, gsub, sub are improperly
> > > treating NA (missing) as
> > > > the string "NA", and returning unexpected results
> > > >
> > >
> > > as were chartr, abbreviate, substr, substring, strsplit.
> > > Fixed in r-devel,
> > > for the case of NA in the `main' string. Haven't yet decided
> > > what to do
> > > about
> > > grep(as.character(NA), x)
> > > or
> > > substr(x,1,2)<-as.charcter(NA)
> > >
> > >
> > >
> > > -thomas
> > >
> >
>
LEGAL NOTICE\ Unless expressly stated otherwise, this message is... {{dropped}}
More information about the R-devel
mailing list