[R] correlating rows of two differently-sized data frames in R

R. Michael Weylandt michael.weylandt at gmail.com
Thu Aug 9 19:57:11 CEST 2012


Hi Jen,

It's generally best to keep cc'ing R-help so others can lend a hind
when I step away from my computer:

On Thu, Aug 9, 2012 at 11:49 AM, Jennifer Hobbs <jenachobbs at gmail.com> wrote:
> Hi Michael -
>
> thanks for the advice - I did find merge() just after posting but I'm having
> difficulty with using it.  I've loaded both datasets; then I tried
>
>> CombinedData<-merge(MethyData1,ExprData1)
>
> but when I looked at CombinedData, I found there was no actual data in it:
>
>> str(CombinedData)
> 'data.frame': 0 obs. of  20 variables

Take a look at

?merge.data.frame

in particular since there are many different forms of merges. Your
original post suggests you may want to set

all = TRUE
by = "Location"

Hope that helps,
Michael



>
> I thought this might be due to the fact that my column names, as well as the
> row names, in both data sets were the same, so I renamed the column names in
> ExprData1 and tried again:
>
>> colnames(ExprData1)<-NewExprNames
>> merge(ExprData1,MethyData1)
> Error: cannot allocate vector of size 4.2 Gb
> In addition: Warning messages:
> 1: In expand.grid(seq_len(nx), seq_len(ny)) :
>   Reached total allocation of 8055Mb: see help(memory.size)
> 2: In expand.grid(seq_len(nx), seq_len(ny)) :
>   Reached total allocation of 8055Mb: see help(memory.size)
> 3: In expand.grid(seq_len(nx), seq_len(ny)) :
>   Reached total allocation of 8055Mb: see help(memory.size)
> 4: In expand.grid(seq_len(nx), seq_len(ny)) :
>   Reached total allocation of 8055Mb: see help(memory.size)
>
> I was surprised about this, as I'm using a 64-bit computer and it's managed

You'll also need to be using a 64 bit build of R. Merging is pretty
memory expensive so if you're right on the edge of what R can handle
you might have to look into a more specialized solution (such as an
SQL backend)

> to deal with much larger data sets before now (I know that's not the only
> criterion, but my understanding of computers isn't extensive).  I had
> previously run up against a memory problem because I hadn't transformed my
> data (I thought I was looking at columns, the computer was looking at rows)
> so I tried transforming both data sets and merging again, but I end up with
> another empty data frame:
>
>> tED1<-t(ExprData1)
>> tMD1<-t(MethyData1)
>> CombineData<-merge(tED1,tMD1)
>> str(CombineData)
> 'data.frame': 0 obs. of  152247 variables:
>
> This is where I'm stuck.  Any advice would be hugely appreciated!
>
> Jen
>
> On Thu, Aug 9, 2012 at 5:28 PM, R. Michael Weylandt
> <michael.weylandt at gmail.com> wrote:
>>
>> Perhaps load them both and ?merge can show you the way.
>>
>> Michael
>>
>> On Thu, Aug 9, 2012 at 9:54 AM, JenniferH <jenachobbs at gmail.com> wrote:
>> > Hello everyone,
>> >
>> > I have two sets of data, with the following structure:
>> >
>> > DataSet1
>> > Location   Part    Sample 1   Sample 2
>> > A                     1           value         value
>> > A                     2           value         value
>> > A                     3           value         value
>> > B                     1           value         value
>> >
>> > DataSet2
>> > Location   Sample 1    Sample 2
>> > A                      value          value
>> > B                      value          value
>> > C                      value          value
>> >
>> > I would like to look at the correlations between DataSet1 and DataSet2,
>> > such
>> > that each row in Location A from DataSet1 is paired with the Location A
>> > row
>> > from DataSet2, and so forth.  So far, my only ideas  involve trying to
>> > copy-paste each of the rows in DataSet2 the number of times each occurs
>> > in
>> > DataSet1 on a spreadsheet before loading the sets into R; however, as I
>> > have
>> > approaching 8000 rows in DataSet2, this is clearly not a workable
>> > solution!
>> >
>> > I'm sure there's a simple solution to this, so I'm sorry if this seems
>> > like
>> > a really silly question.
>> >
>> > Thanks for your help!
>> >
>> > Jen
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> > http://r.789695.n4.nabble.com/correlating-rows-of-two-differently-sized-data-frames-in-R-tp4639774.html
>> > Sent from the R help mailing list archive at Nabble.com.
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list