[R] Inconsistence in specifying action for missing data
Thomas Lumley
tlumley at u.washington.edu
Sun Sep 4 18:42:37 CEST 2005
On Sat, 3 Sep 2005, John Sorkin wrote:
> A question for R (and perhaps S and SPlus) historians.
>
> Does anyone know the reason for the inconsistency in the way that the
> action that should be taken when data are missing is specified? There
> are several variants, na.action, na.omit, "T", TRUE, etc. I know that a
> foolish consistency is the hobgoblin of a small mind, but consistency
> can make things easier.
>
There's actually a little more consistency than first appears. There are
two most common ways to refer to missingness, na.rm and na.action. Usually
na.rm has default TRUE (using T is a bug) and removes NAs from one vector
at a time.
na.action usually has default na.omit() and works on whole data frames, eg
na.omit and na.exclude do casewise deletion if any variable is NA.
These aren't completely uniform, and that is simply historical. I think
there was once an attempt to make na.fail() the default na.action, but
there was too much resistance to change.
-thomas
More information about the R-help
mailing list