[R] quick matching question
Marc Schwartz
marc_schwartz at me.com
Fri Oct 28 16:59:50 CEST 2011
On Oct 28, 2011, at 9:49 AM, Ben Ganzfried wrote:
> Hey,
>
> I'm trying to match patient identifiers from two separate input files, and
> then add information from one of the input files to the corresponding output
> file. I'd greatly appreciate any help!
>
> More specifically,
> Input_File_1 has a column header "bcr_patient_barcode"
> Input_File_2 has a column header "Barcode" and a column header "Batch"
>
> I want my script to match the appropriate patient identifiers since
> "bcr_patient_barcode" and "Barcode" are not in the same order. Then I want
> to add the information from "Batch" to the corresponding patient.
>
> My (incorrect) code is below:
>
> #batch
> tmp <- Input_File_2$Barcode
> tmp1 <- Input_File_1$bcr_patient_barcode
>
> for i in tmp
> for item in tmp1
> if (tmp == tmp1) {
> curated$batch <- Input_File_2$Batch
> }
>
> Thanks!
See ?merge and then use something like:
newDF <- merge(Input_File_2, Input_File_1, by.x = "Barcode", by.y = "bcr_patient_barcode")
Also, pay attention to the 'all', 'all.x' and 'all.y' arguments, which control whether or not only matching records are retained or non-matching records are retained from one or both datasets. merge() performs an "SQL-like" join operation.
HTH,
Marc Schwartz
More information about the R-help
mailing list