[R] re placing <NA> in character column

David Winsemius dwinsemius at comcast.net
Thu Oct 29 00:14:26 CET 2009


On Oct 28, 2009, at 7:06 PM, David Winsemius wrote:

>
> On Oct 28, 2009, at 6:22 PM, Mark W. Miller wrote:
>
>>
>>
>>
>>
>> I am guessing that your <NA> is not really an NA_character,  but
>> rather a factor with a level of "<NA>".
>>
>> See if str(s1) confirms my suspicions.
>>
>>> df1 <- read.table(textConnection(" Firstname  Lastname Age
>> + 1    Bob       Smith   20
>> + 2   John       Clark   NA
>> + 3   Andy       <NA>    40"), header=T)
>>> levels(df1$Lastname)
>> [1] "<NA>"  "Clark" "Smith"
>>> levels(df1$Lastname)[1] <- "qqqq"
>>> levels(df1$Lastname)
>> [1] "qqqq"  "Clark" "Smith"
>>> df1
>>  Firstname Lastname Age
>> 1       Bob    Smith  20
>> 2      John    Clark  NA
>> 3      Andy     qqqq  40
>>
>> --  
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>>
>>
>>
>>
>> Dr. Winsemius, I just tried your suggestion of typing str(s1).   
>> Below is the
>> code and the comments that R returned:
>>
>>> channel <- odbcConnectExcel('u:/test.xls')
>>> sqlTables(channel)
>>
>> TABLE_CAT TABLE_SCHEM TABLE_NAME   TABLE_TYPE REMARKS
>> 1  u:\\test        <NA>    Sheet1$ SYSTEM TABLE    <NA>
>> 2  u:\\test        <NA>    Sheet2$ SYSTEM TABLE    <NA>
>> 3  u:\\test        <NA>    Sheet3$ SYSTEM TABLE    <NA>
>>
>>>
>>> s1 <- sqlFetch(channel, "Sheet1")
>>>
>>> odbcClose(channel)
>>>
>>> s1
>>
>> Firstname Lastname Age
>> 1       Bob    Smith  20
>> 2      John    Clark  NA
>> 3      Andy     <NA>  40
>>
>>
>>> str(s1)
>>
>> 'data.frame':   3 obs. of  3 variables:
>> $ Firstname: Factor w/ 3 levels "Andy","Bob","John": 2 3 1
>> $ Lastname : Factor w/ 2 levels "Clark","Smith": 2 1 NA
>
> My guess was wrong.
>
>> $ Age      : num  20 NA 40
>>
>>> levels(s1$Lastname)
>>
>> [1] "Clark" "Smith"
>>
>>>
>>
>>
>>
>> When I used the code:
>>
>> levels(s1$Lastname)[1] <- "qqqq"
>>
>> R replaced Clark with qqqq and kept <NA> I guess because <NA> is not
>> considered a level here:
>>
>>
>>> levels(s1$Lastname)[1] <- "qqqq"
>
> When I did that on the dataset I already had examined "df1". I knew  
> that <NA> was a level. In your case is was not a named level. So you  
> should not have expected the same outcome as with my efforts.
>
> I am unable to explain why your methods did not work with that  
> structure:
>
> This seems to work:
>
> > df1$Lastname <- ifelse(is.na(df1$Lastname), "qqqq", df1$Lastname)
> > df1
>  Firstname Lastname Age
> 1       Bob        3  20
> 2      John        2  NA
> 3      Andy     qqqq  40

Oooops. Tripped up by factors and ifelse again. Would probably need to  
be:

 > df1$Lastname <- ifelse(is.na(df1$Lastname), "qqqq",  
as.character(df1$Lastname))
 > df1
   Firstname Lastname Age
1       Bob    Smith  20
2      John    Clark  NA
3      Andy     qqqq  40

>
>
>
>>> s1
>> Firstname Lastname Age
>> 1       Bob    Smith  20
>> 2      John     qqqq  NA
>> 3      Andy     <NA>  40
>>>
>>
>>
>>
>>
>> -- 
>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list