[R] efficient way to make NAs of empty cells in a factor (or character)

Thu Aug 3 16:40:42 CEST 2006

Hi

try to set

na.strings = ""

in calling read.csv2. Works for me

> is.na(read.delim("clipboard", na.strings="")$mono)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

> read.delim("clipboard", na.strings="")$mono
[1] hruby     hruby     jemny     jemny     nejhrubsi nejhrubsi 
standard  standard  <NA>     
Levels: hruby jemny nejhrubsi standard

or you can try

test[(test=="")] <- NA

HTH
Petr

On 3 Aug 2006 at 15:46, Henrik Parn wrote:

Date sent:      	Thu, 03 Aug 2006 15:46:32 +0200
From:           	Henrik Parn <henrik.parn at bio.ntnu.no>
Organization:   	NTNU
To:             	R-help <r-help at stat.math.ethz.ch>
Subject:        	[R] efficient way to make NAs of empty cells in a factor (or
	character)
Send reply to:  	henrik.parn at bio.ntnu.no
	<mailto:r-help-request at stat.math.ethz.ch?subject=unsubscribe>
	<mailto:r-help-request at stat.math.ethz.ch?subject=subscribe>

> Dear all,
> 
> I have some csv-files (originating from Excel-files) containing empty
> cells. In my example file I have four variables of different classes,
> each with some empty cells in the original csv-file:
> 
>  > test <- read.csv2("test.csv", dec=".")
> 
>  > test
>   id id2  x   y
> 1  a      1  NA
> 2  b   e NA 2.2
> 3      f  3 3.3
> 4  c   g  4 4.4
> 
> 
>  > class(test$id)
> [1] "factor"
>  > class(test$id2)
> [1] "factor"
>  > class(test$x)
> [1] "integer"
>  > class(test$y)
> [1] "numeric"
> 
> In the help text of read.csv2 you can read 'Blank fields are also
> considered to be missing values in logical, integer, numeric and
> complex fields.'. Thus, empty cells in a factor (or a character I
> assume) is not considered as missing values but an own level:
> 
>  > is.na(test$id)
> [1] FALSE FALSE FALSE FALSE
>  > levels(test$id)
> [1] ""  "a" "b" "c"
> 
> When I work with my real (larger) dataset I would like to use
> functions like 'is.na' and '!is.na' on factors. Now I wonder if there
> is an R alternativ to do 'search (for empty cells) and replace (with
> NA)' in Excel?
> 
> I have tried a modification of Uwe Ligges suggestion on missing value
> posted 2 Aug:
>  > is.na(test[test==""]) <- TRUE
> 
> ...but it did not work on the data set:
> 
> Error in "[<-.data.frame"(`*tmp*`, test == "", value = c(NA, NA, NA,
> NA :
>         rhs is the wrong length for indexing by a logical matrix
> 
> 
> However it worked fine when applied to a single vector:
> 
>  > is.na(test$id[test$id==""]) <- TRUE
>  > test$id
> [1] a    b    <NA> c  
> Levels:  a b c
> 
>  > is.na(test$id)
> [1] FALSE FALSE  TRUE FALSE
> 
> Is there a more efficient way to fill empty cells in all my factors in
> R or should I just do it in advance in Excel by 'search and replace'?
> 
> Thanks in advance!
> 
> -- 
> ************************
> Henrik Pärn
> Department of Biology
> NTNU
> 7491 Trondheim
> Norway
> 
> +47 735 96282 (office)
> +47 909 89 255 (mobile)
> +47 735 96100 (fax)
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

Petr Pikal
petr.pikal at precheza.cz