[R] efficient way to make NAs of empty cells in a factor (or character)
Petr Pikal
petr.pikal at precheza.cz
Thu Aug 3 16:40:42 CEST 2006
Hi
try to set
na.strings = ""
in calling read.csv2. Works for me
> is.na(read.delim("clipboard", na.strings="")$mono)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
> read.delim("clipboard", na.strings="")$mono
[1] hruby hruby jemny jemny nejhrubsi nejhrubsi
standard standard <NA>
Levels: hruby jemny nejhrubsi standard
or you can try
test[(test=="")] <- NA
HTH
Petr
On 3 Aug 2006 at 15:46, Henrik Parn wrote:
Date sent: Thu, 03 Aug 2006 15:46:32 +0200
From: Henrik Parn <henrik.parn at bio.ntnu.no>
Organization: NTNU
To: R-help <r-help at stat.math.ethz.ch>
Subject: [R] efficient way to make NAs of empty cells in a factor (or
character)
Send reply to: henrik.parn at bio.ntnu.no
<mailto:r-help-request at stat.math.ethz.ch?subject=unsubscribe>
<mailto:r-help-request at stat.math.ethz.ch?subject=subscribe>
> Dear all,
>
> I have some csv-files (originating from Excel-files) containing empty
> cells. In my example file I have four variables of different classes,
> each with some empty cells in the original csv-file:
>
> > test <- read.csv2("test.csv", dec=".")
>
> > test
> id id2 x y
> 1 a 1 NA
> 2 b e NA 2.2
> 3 f 3 3.3
> 4 c g 4 4.4
>
>
> > class(test$id)
> [1] "factor"
> > class(test$id2)
> [1] "factor"
> > class(test$x)
> [1] "integer"
> > class(test$y)
> [1] "numeric"
>
> In the help text of read.csv2 you can read 'Blank fields are also
> considered to be missing values in logical, integer, numeric and
> complex fields.'. Thus, empty cells in a factor (or a character I
> assume) is not considered as missing values but an own level:
>
> > is.na(test$id)
> [1] FALSE FALSE FALSE FALSE
> > levels(test$id)
> [1] "" "a" "b" "c"
>
> When I work with my real (larger) dataset I would like to use
> functions like 'is.na' and '!is.na' on factors. Now I wonder if there
> is an R alternativ to do 'search (for empty cells) and replace (with
> NA)' in Excel?
>
> I have tried a modification of Uwe Ligges suggestion on missing value
> posted 2 Aug:
> > is.na(test[test==""]) <- TRUE
>
> ...but it did not work on the data set:
>
> Error in "[<-.data.frame"(`*tmp*`, test == "", value = c(NA, NA, NA,
> NA :
> rhs is the wrong length for indexing by a logical matrix
>
>
> However it worked fine when applied to a single vector:
>
> > is.na(test$id[test$id==""]) <- TRUE
> > test$id
> [1] a b <NA> c
> Levels: a b c
>
> > is.na(test$id)
> [1] FALSE FALSE TRUE FALSE
>
> Is there a more efficient way to fill empty cells in all my factors in
> R or should I just do it in advance in Excel by 'search and replace'?
>
> Thanks in advance!
>
> --
> ************************
> Henrik Pärn
> Department of Biology
> NTNU
> 7491 Trondheim
> Norway
>
> +47 735 96282 (office)
> +47 909 89 255 (mobile)
> +47 735 96100 (fax)
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.
Petr Pikal
petr.pikal at precheza.cz
More information about the R-help
mailing list