[R] Merge: how can I keep discarded values?
Chuck Cleland
ccleland at optonline.net
Thu Nov 9 14:19:02 CET 2006
Biscarini, Filippo wrote:
> Good morning,
>
> I am merging two datasets and I would like to save the non-matching rows
> in a separate file.
> The problem is how to retrieve the non-matching rows in R.
>
> Example:
>
> DATASET A
> code nomi
> A1 Franco
> A2 Mario
> A3 Andrea
> A4 Sandro
> A5 Luca
>
> DATASET B
> code book
> A1 Guerra e Pace
> A1 Storia di Roma
> A2 La coscienza di Zeno
> A4 Ivanhoe
> A1 I Malavoglia
> A2 Jude the obscure
>
> when merging two rows are unmatched:
>
> A3 Andrea
> A5 Luca
>
> And these are exactly the rows I would like to store in a separate
> file/dataset.
>
> I tried with:
>
> AM<-merge(A,B,all=TRUE)
> A1<-AM[is.na(AM$book),]
>
> to keep the rows with a NA value in the book column.
>
> The problem is that this works in this particular case, but in real
> situations I might have other NA values in the book column, not
> resulting from the merge operation, but that are real missing values: in
> such cases, with the is.na command I would retrieve also thiese unneeded
> rows.
>
> Can someone suggest a better strategy to tackle this problem?
If I understand, you want to find out which code values are in A but
not in B and write those A rows to a file. Here is one way to do that:
dfA <- data.frame(code = paste("A", 1:5, sep=""),
nomi = c("Franco", "Mario", "Andrea",
"Sandro", "Luca"))
dfB <- data.frame(code = c("A1", "A1", "A2", "A4", "A1", "A2"),
book = c("Guerra e Pace", "Storia di Roma",
"La coscienza di Zeno", "Ivanhoe",
"I Malavoglia", "Jude the obscure"))
dfA.nomatch <- subset(dfA, is.element(code, dfB$code) == FALSE)
write.table(dfA.nomatch, file="myfile.dat")
See ?is.element and ?subset
> Regards,
> Filippo Biscarini
> Wageningen University
> The Netherlands
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894
More information about the R-help
mailing list