[R] merge problem... extra lines appear in the presence of NAs

Sean O'Riordain seanpor at acm.org
Sat May 20 14:32:26 CEST 2006


Good morning!

I've searched the docs etc...  Am I doing something wrong or is this a bug?

I'm doing a merge of two dataframes and getting extra rows in the
resulting dataframe - the dataframes being merged might have NAs...

count <- 10
nacount <- 3
a1 <- as.data.frame(as.Date("2005-06-01")+0:(count-1))
names(a1) <- "mdate"
a1$value <- runif(count)
a1[floor(runif(nacount)*count),]$value <- NA

a2 <- as.data.frame(as.Date("2005-06-01")+0:(count-1))
names(a2) <- "mdate"
a2$value2 <- runif(count)
#a2[floor(runif(nacount)*count),]$value2 <- NA

> a1
        mdate     value
1  2005-06-09        NA
2  2005-06-02 0.5287683
3  2005-06-03 0.7563833
4  2005-06-09        NA
5  2005-06-05 0.1027646
6  2005-06-06 0.7775884
7  2005-06-07 0.2993592
8  2005-06-09        NA
9  2005-06-09 0.7434682
10 2005-06-10 0.2096477
> a2
        mdate    value2
1  2005-06-01 0.5347852
2  2005-06-02 0.9322765
3  2005-06-03 0.9106499
4  2005-06-04 0.6810564
5  2005-06-05 0.5871867
6  2005-06-06 0.8123808
7  2005-06-07 0.9675379
8  2005-06-08 0.9470369
9  2005-06-09 0.7493767
10 2005-06-10 0.8864103
> atot <- merge(a1,a2,all=T)

However, I find the following results to be quite un-intuitive - are
they correct?  May I draw your attention to lines 9:12...  Should
lines 9:11 be there?

> atot
        mdate     value    value2
1  2005-06-01        NA 0.5347852
2  2005-06-02 0.5287683 0.9322765
3  2005-06-03 0.7563833 0.9106499
4  2005-06-04        NA 0.6810564
5  2005-06-05 0.1027646 0.5871867
6  2005-06-06 0.7775884 0.8123808
7  2005-06-07 0.2993592 0.9675379
8  2005-06-08        NA 0.9470369
9  2005-06-09        NA 0.7493767
10 2005-06-09        NA 0.7493767
11 2005-06-09        NA 0.7493767
12 2005-06-09 0.7434682 0.7493767
13 2005-06-10 0.2096477 0.8864103

Note with no NAs, it works perfectly and as expected...
> a1 <- as.data.frame(as.Date("2005-06-01")+0:(count-1))
> names(a1) <- "mdate"
> a1$value <- runif(count)
> #a1[floor(runif(nacount)*count),]$value <- NA
>
> atot <- merge(a1,a2,all=T)
>
> atot
        mdate      value    value2
1  2005-06-01 0.35002519 0.5347852
2  2005-06-02 0.76318940 0.9322765
3  2005-06-03 0.32759570 0.9106499
4  2005-06-04 0.47218729 0.6810564
5  2005-06-05 0.74435374 0.5871867
6  2005-06-06 0.81415290 0.8123808
7  2005-06-07 0.04774783 0.9675379
8  2005-06-08 0.21799101 0.9470369
9  2005-06-09 0.99472758 0.7493767
10 2005-06-10 0.41974293 0.8864103

R started in each case with --vanilla
               _
platform       i386-pc-mingw32
arch           i386
os             mingw32
system         i386, mingw32
status         Patched
major          2
minor          3.0
year           2006
month          05
day            11
svn rev        38037
language       R
version.string Version 2.3.0 Patched (2006-05-11 r38037)

win-xp-pro sp2 - binary installs from CRAN


it works in a similar way if I say
atot <- merge(a1,a2,by.x="mdate",by.y="mdate",all=T)
or even
atot <- merge(a1,a2,by="mdate",all=T)

also tested on versions 2.2.1, 2.3.0

cheers,
Sean O'Riordain

(ps. ctrl-v paste wouldn't work on 2.4.0-dev downloaded this morning -
didn't try very hard though)




More information about the R-help mailing list