[R] Merge: how can I keep discarded values?

Chuck Cleland ccleland at optonline.net
Thu Nov 9 14:19:02 CET 2006


Biscarini, Filippo wrote:
> Good morning,
>  
> I am merging two datasets and I would like to save the non-matching rows
> in a separate file.
> The problem is how to retrieve the non-matching rows in R.  
>  
> Example:
>  
> DATASET A
> code   nomi
>   A1 Franco
>   A2  Mario
>   A3 Andrea
>   A4 Sandro
>   A5   Luca
> 
> DATASET B
> code                book
>  A1        Guerra e Pace
>  A1       Storia di Roma
>  A2   La coscienza di Zeno
>  A4              Ivanhoe
>  A1         I Malavoglia
>  A2     Jude the obscure
> 
> when merging two rows are unmatched:
>  
> A3 Andrea
> A5   Luca
>  
> And these are exactly the rows I would like to store in a separate
> file/dataset.
>  
> I tried with: 
>  
> AM<-merge(A,B,all=TRUE)
> A1<-AM[is.na(AM$book),]
>  
> to keep the rows with a NA value in the book column.
>  
> The problem is that this works in this particular case, but in real
> situations I might have other NA values in the book column, not
> resulting from the merge operation, but that are real missing values: in
> such cases, with the is.na command I would retrieve also thiese unneeded
> rows.
>  
> Can someone suggest a better strategy to tackle this problem?

  If I understand, you want to find out which code values are in A but
not in B and write those A rows to a file.  Here is one way to do that:

dfA <- data.frame(code = paste("A", 1:5, sep=""),
                  nomi = c("Franco", "Mario", "Andrea",
                           "Sandro", "Luca"))

dfB <- data.frame(code = c("A1", "A1", "A2", "A4", "A1", "A2"),
                  book = c("Guerra e Pace", "Storia di Roma",
                           "La coscienza di Zeno", "Ivanhoe",
                           "I Malavoglia", "Jude the obscure"))

dfA.nomatch <- subset(dfA, is.element(code, dfB$code) == FALSE)

write.table(dfA.nomatch, file="myfile.dat")

  See ?is.element and ?subset

> Regards,
> Filippo Biscarini
> Wageningen University
> The Netherlands
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894



More information about the R-help mailing list