[R] efficient way to make NAs of empty cells in a factor (orcharacter)

Dimitris Rizopoulos dimitris.rizopoulos at med.kuleuven.be
Thu Aug 3 16:20:41 CEST 2006


try to use the 'na.strings' argument of read.csv(), e.g.,

test <- read.csv("test.csv", na.strings = "")


I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm



----- Original Message ----- 
From: "Henrik Parn" <henrik.parn at bio.ntnu.no>
To: "R-help" <r-help at stat.math.ethz.ch>
Sent: Thursday, August 03, 2006 3:46 PM
Subject: [R] efficient way to make NAs of empty cells in a factor 
(orcharacter)


Dear all,

I have some csv-files (originating from Excel-files) containing empty
cells. In my example file I have four variables of different classes,
each with some empty cells in the original csv-file:

 > test <- read.csv2("test.csv", dec=".")

 > test
  id id2  x   y
1  a      1  NA
2  b   e NA 2.2
3      f  3 3.3
4  c   g  4 4.4


 > class(test$id)
[1] "factor"
 > class(test$id2)
[1] "factor"
 > class(test$x)
[1] "integer"
 > class(test$y)
[1] "numeric"

In the help text of read.csv2 you can read 'Blank fields are also
considered to be missing values in logical, integer, numeric and 
complex
fields.'. Thus, empty cells in a factor (or a character I assume) is 
not
considered as missing values but an own level:

 > is.na(test$id)
[1] FALSE FALSE FALSE FALSE
 > levels(test$id)
[1] ""  "a" "b" "c"

When I work with my real (larger) dataset I would like to use 
functions
like 'is.na' and '!is.na' on factors. Now I wonder if there is an R
alternativ to do 'search (for empty cells) and replace (with NA)' in 
Excel?

I have tried a modification of Uwe Ligges suggestion on missing value
posted 2 Aug:
 > is.na(test[test==""]) <- TRUE

...but it did not work on the data set:

Error in "[<-.data.frame"(`*tmp*`, test == "", value = c(NA, NA, NA, 
NA :
        rhs is the wrong length for indexing by a logical matrix


However it worked fine when applied to a single vector:

 > is.na(test$id[test$id==""]) <- TRUE
 > test$id
[1] a    b    <NA> c
Levels:  a b c

 > is.na(test$id)
[1] FALSE FALSE  TRUE FALSE

Is there a more efficient way to fill empty cells in all my factors in 
R
or should I just do it in advance in Excel by 'search and replace'?

Thanks in advance!

-- 
************************
Henrik Pärn
Department of Biology
NTNU
7491 Trondheim
Norway

+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



More information about the R-help mailing list