Marc Schwartz marc_schwartz at me.com
Wed Mar 3 17:25:01 CET 2010

On Mar 3, 2010, at 7:24 AM, BioStudent wrote:

If you are going to use 'by.x' then you also need to use 'by.y' so that merge knows which column(s) to use in each data set for the matching. Otherwise, using my original example with 'by', the presumption is that the same column name is available in both datasets.

You can use multiple column names in both datasets to define data combinations that result in a unique one-to-one row pairing. The result will also depend upon the settings of 'all', 'all.x' and 'all.y'. Review the help file for merge(). The default behavior (all = FALSE) only returns the rows that match between the two datasets.

If the files are large and you are having memory allocation problems, then you basically have three choices:

1. Increase the amount of RAM that you have in the computer, which is limited if you are on a 32 bit OS.

2. Move to a 64 bit version of R on a 64 bit OS with sufficient RAM in the computer.

3. Perform your data management tasks using an appropriate database application, rather than in R. This can be done completely in the database and then export to R, or you can access the database from within R using one of the several methods available (eg. ODBC). See the R Import/Export Manual at http://cran.r-project.org/doc/manuals/R-data.html.


