[R] Strange dataframe behavior

Sergey Goriatchev sergeyg at gmail.com
Tue Oct 23 19:10:23 CEST 2007


Hello,

I have a question regarding the following output:

> database <- read.delim(file=path.input.file, header=TRUE, dec=".", sep="\t", na.strings =  "#NV")
> str(database)
'data.frame':   314 obs. of  13 variables:
 $ S       : Factor w/ 314 levels "307073","400212",..: 147 72 299 137
162 62 189 236 134 307 ...
 $ A   : Factor w/ 314 levels "Alfa",...: 285 258 197 3 81 162 183 272
73 301 ...
 $ M: Factor w/ 19 levels "@NA","A",..: 18 10 11 6 7 12 17 17 11 6 ...
 $ W       : num  0 0 0 0 0 ...
 $ T : num   0.0467  0.1095  0.0252  0.0821 -0.0275 ...
 $ C : num  0 0 0 0 0 ...
 $ MF   : num  -0.658  0.261  0.922 -1.897 -1.884 ...
 $ V    : num   0.0585 -1.0852 -0.3156 -1.0592  0.2810 ...
 $ G       : num  -0.568 -1.302  0.225 -1.473 -0.541 ...
 $ Mo     : num   0.34967  0.42807 -0.41407 -0.18216 -0.00305 ...
 $ R     : num  -0.5413 -2.0000  0.5353 -1.1437 -0.0776 ...
 $ Tr        : num  -0.12816  1.04148  0.00647 -0.02424 -1.66834 ...
 $ Su    : num  -1.611  1.160 -0.528 -0.091 -1.148 ...
> which(is.na(database))
[1] 675 704 774 887

So, I have 314 observations, but there are unknown NA observations!
I remove one observation (for certain reasons), and remove the
corresponding factor level, then:
> str(database)
'data.frame':   313 obs. of  13 variables:
....
> which(is.na(database))
[1] 673 702 772 885

The removal of ONE observation moves NAs by two positions.

Maybe someone have an idea what these NA observations mean????
Thanks in advance for your time and help!

Sergey
University of Zurich



More information about the R-help mailing list