[R] efficient way to make NAs of empty cells in a factor (orcharacter)
Dimitris Rizopoulos
dimitris.rizopoulos at med.kuleuven.be
Thu Aug 3 16:20:41 CEST 2006
try to use the 'na.strings' argument of read.csv(), e.g.,
test <- read.csv("test.csv", na.strings = "")
I hope it helps.
Best,
Dimitris
----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
http://www.student.kuleuven.be/~m0390867/dimitris.htm
----- Original Message -----
From: "Henrik Parn" <henrik.parn at bio.ntnu.no>
To: "R-help" <r-help at stat.math.ethz.ch>
Sent: Thursday, August 03, 2006 3:46 PM
Subject: [R] efficient way to make NAs of empty cells in a factor
(orcharacter)
Dear all,
I have some csv-files (originating from Excel-files) containing empty
cells. In my example file I have four variables of different classes,
each with some empty cells in the original csv-file:
> test <- read.csv2("test.csv", dec=".")
> test
id id2 x y
1 a 1 NA
2 b e NA 2.2
3 f 3 3.3
4 c g 4 4.4
> class(test$id)
[1] "factor"
> class(test$id2)
[1] "factor"
> class(test$x)
[1] "integer"
> class(test$y)
[1] "numeric"
In the help text of read.csv2 you can read 'Blank fields are also
considered to be missing values in logical, integer, numeric and
complex
fields.'. Thus, empty cells in a factor (or a character I assume) is
not
considered as missing values but an own level:
> is.na(test$id)
[1] FALSE FALSE FALSE FALSE
> levels(test$id)
[1] "" "a" "b" "c"
When I work with my real (larger) dataset I would like to use
functions
like 'is.na' and '!is.na' on factors. Now I wonder if there is an R
alternativ to do 'search (for empty cells) and replace (with NA)' in
Excel?
I have tried a modification of Uwe Ligges suggestion on missing value
posted 2 Aug:
> is.na(test[test==""]) <- TRUE
...but it did not work on the data set:
Error in "[<-.data.frame"(`*tmp*`, test == "", value = c(NA, NA, NA,
NA :
rhs is the wrong length for indexing by a logical matrix
However it worked fine when applied to a single vector:
> is.na(test$id[test$id==""]) <- TRUE
> test$id
[1] a b <NA> c
Levels: a b c
> is.na(test$id)
[1] FALSE FALSE TRUE FALSE
Is there a more efficient way to fill empty cells in all my factors in
R
or should I just do it in advance in Excel by 'search and replace'?
Thanks in advance!
--
************************
Henrik Pärn
Department of Biology
NTNU
7491 Trondheim
Norway
+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
More information about the R-help
mailing list