[R] Inexplicably different results using subset vs bracket notation on logical variable

peter dalgaard pdalgd at gmail.com
Tue Aug 28 07:57:25 CEST 2012


On Aug 28, 2012, at 05:11 , David Winsemius wrote:

> That's exactly it. If a logical index returns NA, its row is included in the output of "[" extraction. You can correct what I consider a failing and others consider a feature with:
> 
> df[df$Renewal==TRUE & !is.na(df$Renewal), 1:2]
> 

Precisely. To elaborate, some consider it a feature because, if the condition is NA, you effectively don't know whether to include or not, so it includes with NA content, which is arguably a lesser loss of information. 

Also, it treats numerical and logical NA on the same footing so that x[NA] is the same as x[as.numeric(NA)]. It's awkward if x[NA] has length zero but x[c(1,NA)] has length 2.

For integer NA, you definitely want not to exclude, consider plot(..., col=c("red", "blue", "green")[g])

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com




More information about the R-help mailing list