[R] Doubt in simple merge
Marc Schwartz
marc_schwartz at me.com
Fri Jan 17 14:16:10 CET 2014
On Jan 16, 2014, at 11:14 PM, kingsly <ecokingsly at yahoo.co.in> wrote:
> Thank you dear friends. You have cleared my first doubt.
>
> My second doubt:
> I have the same data sets "Elder" and "Younger". Elder <- data.frame(
> ID=c("ID1","ID2","ID3"),
> age=c(38,35,31))
> Younger <- data.frame(
> ID=c("ID4","ID5","ID3"),
> age=c(29,21,"NA"))
>
>
> Row ID3 comes in both data set. It has a value (31) in "Elder" while "NA" in "Younger".
>
> I need output like this.
>
> ID age
> ID1 38
> ID2 35
> ID3 31
> ID4 29
> ID5 21
>
> Kindly help me.
First, there is a problem with the way in which you created Younger, where you have the NA as "NA", which is a character and coerces the entire column to a factor, rather than a numeric:
> str(Younger)
'data.frame': 3 obs. of 2 variables:
$ ID : Factor w/ 3 levels "ID3","ID4","ID5": 2 3 1
$ age: Factor w/ 3 levels "21","29","NA": 2 1 3
It then causes problems in the default merge():
DF <- merge(Elder, Younger, by = c("ID", "age"), all = TRUE)
> str(DF)
'data.frame': 6 obs. of 2 variables:
$ ID : Factor w/ 5 levels "ID1","ID2","ID3",..: 1 2 3 3 4 5
$ age: chr "38" "35" "31" "NA" ...
Note that 'age' becomes a character vector, again rather than numeric.
Thus:
Younger <- data.frame(ID = c("ID4", "ID5", "ID3"), age = c(29, 21, NA))
Now, when you merge as before, you get:
> str(merge(Elder, Younger, by = c("ID", "age"), all = TRUE))
'data.frame': 6 obs. of 2 variables:
$ ID : Factor w/ 5 levels "ID1","ID2","ID3",..: 1 2 3 3 4 5
$ age: num 38 35 31 NA 29 21
> merge(Elder, Younger, by = c("ID", "age"), all = TRUE)
ID age
1 ID1 38
2 ID2 35
3 ID3 31
4 ID3 NA
5 ID4 29
6 ID5 21
Presuming that you want to consistently remove any NA values that may arise from either data frame:
> na.omit(merge(Elder, Younger, by = c("ID", "age"), all = TRUE))
ID age
1 ID1 38
2 ID2 35
3 ID3 31
5 ID4 29
6 ID5 21
See ?na.omit
Regards,
Marc Schwartz
More information about the R-help
mailing list