[R] insert values based on common ID

Marc Schwartz marc_schwartz at me.com
Fri May 14 17:24:53 CEST 2010


On May 14, 2010, at 10:15 AM, emorway wrote:

> 
> Forum,
> 
> with the datasets a and b below, I'm trying to establish a relationship
> based on the common column "week" and insert the value from the column
> weekAvg in b to the column weekAvg in a.  The dataset a is several thousand
> lines long.  I've tried looking at 'match', writing functions, 'rbind.fill'
> and various search terms in this forum to no avail.  Thanks...
> 
> a<-read.table(textConnection("row ID date Reading WellID week weekAvg
> 1 Well_80-2 6/12/2000 2:00 -202.034 80-2 0 NA
> 2 Well_80-2 6/12/2000 3:00 -201.018 80-2 0 NA
> 3 Well_80-2 6/12/2000 4:00 -199.494 80-2 0 NA
> 4 Well_80-2 6/12/2000 5:00 -197.716 80-2 0 NA
> 5 Well_80-2 6/12/2000 6:00 -190.858 80-2 0 NA
> 6 Well_80-2 6/12/2000 7:00 -181.460 80-2 0 NA
> 7 Well_80-2 6/19/2000 10:00 -166.728 80-2 1 NA
> 8 Well_80-2 6/19/2000 11:00 -167.490 80-2 1 NA
> 9 Well_80-2 6/19/2000 12:00 -167.490 80-2 1 NA
> 10 Well_80-2 6/19/2000 13:00 -167.490 80-2 1 NA
> 11 Well_80-2 6/19/2000 14:00 -168.506 80-2 1 NA
> 12 Well_80-2 6/19/2000 15:00 -168.506 80-2 1 NA"),header=T)
> closeAllConnections() 
> 
> b<-read.table(textConnection("week weekAvg  
> 0 -147.3726
> 1 -181.3429
> 2 -151.7208
> 3 -188.8653
> 4 -163.7465
> 5 -161.6873
> 6 -158.5168
> 7 -146.6136
> 8 -175.4351
> 9 -100.9450
> 10 -151.3655
> 11 -125.8975
> 12 -162.5993"),header=T)
> closeAllConnections() 


See ?merge and ?subset

merge() performs a SQL-like 'join' operation.


> merge(subset(a, select = -weekAvg), b, by = "week", all.x = TRUE)
   week       row        ID  date  Reading WellID   weekAvg
1     0 Well_80-2 6/12/2000  2:00 -202.034   80-2 -147.3726
2     0 Well_80-2 6/12/2000  3:00 -201.018   80-2 -147.3726
3     0 Well_80-2 6/12/2000  4:00 -199.494   80-2 -147.3726
4     0 Well_80-2 6/12/2000  5:00 -197.716   80-2 -147.3726
5     0 Well_80-2 6/12/2000  6:00 -190.858   80-2 -147.3726
6     0 Well_80-2 6/12/2000  7:00 -181.460   80-2 -147.3726
7     1 Well_80-2 6/19/2000 10:00 -166.728   80-2 -181.3429
8     1 Well_80-2 6/19/2000 11:00 -167.490   80-2 -181.3429
9     1 Well_80-2 6/19/2000 12:00 -167.490   80-2 -181.3429
10    1 Well_80-2 6/19/2000 13:00 -167.490   80-2 -181.3429
11    1 Well_80-2 6/19/2000 14:00 -168.506   80-2 -181.3429
12    1 Well_80-2 6/19/2000 15:00 -168.506   80-2 -181.3429


In the above, I am using subset() on 'a' to remove the pre-existing 'weekAvg' column, so that you only end up with one such column post merge. If you don't do this, you will have a 'weekAvg.x' and 'weekAvg.y' in the result.

The two data frames are then merge()d on the common 'week' column. 

The 'all.x = TRUE', retains all rows in 'a' that match to the 'week' value in 'b'. Otherwise known as a 'left outer join'.

HTH,

Marc Schwartz



More information about the R-help mailing list