[R] Why na.rm=FALSE is the default

Tue Mar 24 20:07:30 CET 2009

The only reference that I can think of (a bit subtle/indirect) is: http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html (look in the section on Propagation of Blanks).

But I think that it really comes down to the following 2 variations on a rule:

1. Important decisions (such as throwing away information) should be made by a human not a computer
2. Important decisions (such as throwing away information) should be made by a person familiar with the data and scientific question, not by a programmer separated in time and space from the real question who was unlikely to be able to anticipate every situation.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Adam D. I. Kramer
> Sent: Tuesday, March 24, 2009 12:24 PM
> To: r-help at r-project.org
> Subject: [R] Why na.rm=FALSE is the default
> 
> Dear Colleagues,
> 
>  	I've been searching for a post or article or something which
> explains why having na.rm=FALSE or na.action=na.fail as the default is
> a
> better choice than TRUE or na.omit.
> 
>  	I understand the basic argument: it does not make sense to
> average a
> nonexistance into an aggregate, and removing them implicitly leads to
> accidental pairwise deletion in some cases, and sum(x) / length(x) <
> mean(x)
> (which many would find disturbing)...I'm just looking for a source to
> cite
> on this issue to support mimicking R's behavior in a database system's
> aggregating functions (sum, avg, var, etc.).
> 
> Cordially,
> Adam Kramer
> Ph.D. Candidate, Social Psychology
> University of Oregon
> adik at uoregon dot edu
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.