[R] merge: right set overwrite left set
Ista Zahn
istazahn at gmail.com
Sun Jul 12 19:06:08 CEST 2015
I think this does what you want:
## find idiv coloumns in x.HHu.map that don't exist in y.HHo.map
x.HHu.map <- x.HHu.map[
c("HHid",
"position",
names(x.HHu.map)[
!names(x.HHu.map)
%in% names(y.HHo.map)]
)]
## merge, adding extra column from x.HHu.map
zzz <- merge(y.HHo.map, x.HHu.map, by=c('HHid', 'position'), all=T)
## order by HHid
zzz <- zzz[order(zzz$HHid),]
Best,
Ista
On Sun, Jul 12, 2015 at 10:45 AM, aldi <aldi at dsgmail.wustl.edu> wrote:
> Hi,
> I have two sets of data x.HHu and y.HHo, rows are IDs and columns are
> individuals. I do not know in advance indv or HHid, both of them will be
> captured from the data. As the y.HHo set updates, y.HHo set has better
> information then x.HHu set. Thus I want a merge where right set
> overwrites left set info based on HHid, i.e. to overwrite x.HHu set with
> y.HHo set but keep any extra info from the x.HHu set that is not present
> in y.HHo set.
> HHids will be complete based on z.map, with the corresponding positions.
> I am having trouble with the part after this line: ###
> ============================================+++++++++++++++++++++++++++
> I am thinking that I am creating new columns "position" "indv1" and
> "indv2", but R is interpreting them as row information.
> See the expected final table at the end. HHid is common, indv3 is from
> x.HHu, and the rest position and indv1 and indv2 are from y.HHo
> Any suggestions are appreciated.
> Thank you in advance,
> Aldi
>
> x.HHu<- data.frame(
> HHid = c( 'HH1', 'HH2', 'HH3', 'HH4', 'HH5', 'HH10')
> , indv1 = c( 2, 0, 2 , 0, 2, 0)
> , indv2 = c( 0, NA, 2, 2, 2, 2)
> , ind3 = c( 0, 0, 0, 0, 0, 0)
> )
> ### the HHo data will be the top set to overwrite any HHu data, when
> they exist, thinking that HHo are better than HHu results
> ### when they are available
>
> y.HHo<-data.frame(HHid=c('HH1', 'HH2','HH5', 'HH3', 'HH10')
> , indv1 = c(2, 0, 2, 0, NA)
> , indv2 = c(0, 2, 2, 1, 2)
> )
>
> z.map<-data.frame(HHid = c('HH1', 'HH2', 'HH3', 'HH4', 'HH5',
> 'HH6','HH8', 'HH7', 'HH9', 'HH10', 'HH11')
> , position= c(10,20,30,42,55,66,81,75,92,101,111)
> )
> ### see objects
> x.HHu
> y.HHo
> z.map
> ### now sort the map by position, this sorted map will be used to sort
> finally all data
> z.map<-z.map[with(z.map, order(position)), ]
> z.map
>
> ### First I introduce position to both sets so I can sort them in
> advance by position.
> x.HHu.map <-merge( z.map, x.HHu, by='HHid', all=T)
> x.HHu.map<-x.HHu.map[with(x.HHu.map, order(position)), ]
> x.HHu.map
>
> y.HHo.map <-merge( z.map, y.HHo, by='HHid', all= T)
> y.HHo.map<-y.HHo.map[with(y.HHo.map, order(position)), ]
> y.HHo.map
>
> ### now merge HHu and HHo with the hope to overwrite the HHu set with
> HHo wherever they overlap by column names.
> zzz <- merge(x.HHu.map, y.HHo.map, by='HHid', all=T)
> zzz
> ### find common variable names in two sets
>
> commonNames <- names(x.impu.map)[which(colnames(x.impu.map) %in%
> colnames(y.geno.map))]
>
> ## remove HHid wich is common for x and y, but work with the rest of columns
> commonNames<-commonNames[-c(1)]
>
> ### ============================================+++++++++++++++++++++++++++
> for(i in 1:length(commonNames)){
>
> print(commonNames[i])
> zzz$commonNames[i] <- NA
>
> print(paste("zzz","$",commonNames[i],".y",sep=""))
>
> zzz$commonNames[i] <- zzz[,paste(commonNames[i],".y",sep="")]
>
> ### paste(zzz$commonNames[i],".x",sep='') <- NULL;
> ### paste(zzz$commonNames[i],".y",sep='') <- NULL;
>
> }
> zzz
>
> The final expected set has to be: HHid is common, indv3 is from x.HHu,
> and the rest position and indv1 and indv2 are from y.HHo
> HHid position ind3 indv1 indv2
> 1 HH1 10 0 2 0
> 2 HH10 101 0 NA 2
> 3 HH11 111 NA NA NA
> 4 HH2 20 0 0 2
> 5 HH3 30 0 0 1
> 6 HH4 42 0 NA NA
> 7 HH5 55 0 2 2
> 8 HH6 66 NA NA NA
> 9 HH7 75 NA NA NA
> 10 HH8 81 NA NA NA
> 11 HH9 92 NA NA NA
>
> --
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list