[R] Merge dataframes

jdanielnd jdanielnd at gmail.com
Fri Oct 7 15:34:33 CEST 2011


Hello,

I am having some problems to use the 'merge' function. I'm not sure if I got
its working right.

What I want to do is:

1) Suppose I have a dataframe like:

        height           width
1        1.1                2.3
2        2.1                2.5
3        1.8                1.9
4        1.6                2.1
5        1.8                2.4

2) And I generate a second dataframe sampled from this one, like:

        height           width
1        1.1                2.3
3        1.8                1.9
5        1.8                2.4

3) Next, I add a new variable from this dataframe:

        height            width         color
1        1.1                2.3            red
3        1.8                1.9            red
5        1.8                2.4            blue

4) So, I want to merge those dataframes, so that the new variable, color, is
binded to the first dataframe. Of course some cases won't have value for it,
since I generated this variable in a smaller dataframe. In those cases I
want the value to be NA. The result dataframe should be:

        height            width         color
1        1.1                2.3            red
2        2.1                2.5            NA
3        1.8                1.9            red
4        1.6                2.1            NA
5        1.8                2.4            blue

I have written some codes, but they're not working properly. The new
variable has its values mixed up, and they do not correspond to its
row.names.

# Generate the first dataframe
data1 <- data.frame(height=rnorm(20,3,0.2),width=rnorm(20,2,0.5))
# Sample a smaller dataframe from data1
data2 <- data1[sample(1:20,15,replace=F),]
# Generate the new variable
color <- sample(c("red","blue"),15,replace=T)
# Bind the new variable to data2
data2 <- cbind(data2, color)
# Merge the data1 and data2$color by row.names, and force it to has the same
values that data1. Next it generates a new dataframe where column 1 is the
row.name, and then sort it by the row.name from data1.
data.frame(merge(data1,data2$color, by=0,
all.x=T),row.names=1)[row.names(data1),]

I'm not sure what am I doing wrong. Can anyone see where the mistake is?

Thank you!

Cheers,

Joao D.

--
View this message in context: http://r.789695.n4.nabble.com/Merge-dataframes-tp3882222p3882222.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list