[R] Correlation between matrices

Dennis Murphy djmuser at gmail.com
Sun Nov 6 16:53:03 CET 2011


Hi:

On Sat, Nov 5, 2011 at 11:06 PM, Kaiyin Zhong <kindlychung at gmail.com> wrote:
> Thank you Dennis, your tips are really helpful.
> I don't quite understand the lm(y~mouse) part; my intention was -- in
> pseudo code -- lm(y(Enzyme) ~ y(each elem)).

As I said in my first response, I didn't quite understand what you
were trying to regress so I used the mouse as a way of showing you how
the code works. I think I understand what you want now, though.

I'll create a data set in two ways: the first assumes you have the
data as constructed in your original post and the second generates
random numbers after erecting a 'scaffold' data frame. The game is to
separate the enzyme data from the element data and put them into the
final data frame as separate columns. Then the regression is easy if
that's what you need to do.

# Method 1: Generate the data as you did into separate data frames

elem0 <- c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')
regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain',
            'cerebellum')

# Creates five 5 x 5 data frames with names V1-V5:
for (n in c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')) {
   assign(n, as.data.frame(replicate(5, rnorm(5))))
 }

# Stack the chemical element data using melt() from
# the reshape2 package:
library('reshape2')
d1 <- rbind(melt(Cu), melt(Zn), melt(Fe), melt(Ca))
# Relabel V1 - V5 with brain region names, add a factor
# to distinguish individual elements and tack on the melted
# Enzyme data so that it repeats in each element block
d1 <- within(d1, {
        variable <- factor(d1$variable, labels = regions)
        elem <- factor(rep(elem0[1:4], each = 25))
        Enzyme <- melt(Enzyme)[, 2]
      }     )
# Plot the data using lattice and latticeExtra:
library('lattice')
library('latticeExtra')
p <- xyplot(Enzyme ~ value | variable + elem,
                   data = d1, type = c('p', 'r'))
useOuterStrips(p)

###########################################
## Method 2: Generate the random data after setting
##                 up the element/region/mouse combinations
##

# Generate a data frame from the combinations of
# mice, regions and elem:

library('ggplot2')

mice <- paste('mouse', 1:5, sep = '')
regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain',
            'cerebellum')
elem <- elem0[1:4]
d0 <- data.frame(expand.grid(mice = mice,
                             regions = regions, elem = elem))
d0 <- within(d0, {
        value <- rnorm(100)   # generate element values
        Enzyme <- rnorm(25)  # generate enzyme values
      }         )

# the Enzyme values are recycled through all element blocks.

# You can either adapt the lattice code above to plot d0, or you
# can do the following to get an analogous plot in ggplot2.
# It's easier to compute the slopes and intercepts and put
# them into a data frame that ggplot() can import, so that's
# what we'll do first.

# A function to return regression coefficients from a
# generic data frame. Since this function goes into ddply(),
# the argument df is a (generic) data frame and the output
# will be converted to a one-line data frame.

coefun <- function(df) coef(lm(Enzyme ~ value, data = df))

# Apply the function to all regions * elem combinations.
# Output is a data frame of coefficients corresponding to
# each region/element combination

coefs <- ddply(d0, .(regions, elem), coefun)
# Rename the columns
names(coefs) <- c('regions', 'elem', 'b0', 'b1')

# Generate the plot using package ggplot2:
ggplot(d0, aes(x = val, y = Enzyme)) +
   geom_point(size = 2.5) +
   geom_abline(data = coefs, aes(intercept = b0, slope = b1),
                             size = 1) +
   xlab("") +
   facet_grid(elem ~ regions)


> In addition, attach(d) seems necessary before using lm(y~mouse), and
> since d$mouse has a length 125, while each elem for each region has a
> length 5, it generates the following error:

You should never need to use attach() - use the data = argument in
lm() instead, where the value of data is the name of a data frame.
It's always easier to use the modeling functions in R having formula
interfaces with data frames.
>
>> coefs = ddply(d, .(regions, elem), coefun)
> Error in model.frame.default(formula = y ~ mouse, drop.unused.levels = TRUE) :
>  variable lengths differ (found for 'mouse')

You're clearly doing something here that's messing up the structure of
the data. Study what the code (and its output) above are telling you,
particularly if you're not familiar with plyr, lattice and/or ggplot2.
Writing functions to insert into a **ply() function in plyr can be
tricky. If you continue to have problems, please provide a
reproducible example as you did here.

HTH,
Dennis
>
>
> On Sun, Nov 6, 2011 at 12:53 PM, Dennis Murphy <djmuser at gmail.com> wrote:
>>
>> Hi:
>>
>> I don't think you want to keep these objects separate; it's better to
>> combine everything into a data frame. Here's a variation of your
>> example - the x variable ends up being a mouse, but you may have
>> another variable that's more appropriate to plot so take this as a
>> starting point. One plot uses the ggplot2 package, the other uses the
>> lattice and latticeExtra packages.
>>
>> library('ggplot2')
>> regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain',
>>            'cerebellum')
>> mice = paste('mouse', 1:5, sep='')
>> elem <- c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')
>>
>> # Generate a data frame from the combinations of
>> # mice, regions and elem:
>> d <- data.frame(expand.grid(mice = mice, regions = regions,
>>                            elem = elem), y = rnorm(125))
>> # Create a numeric version of mice
>> d$mouse <- as.numeric(d$mice)
>>
>> # A function to return regression coefficients
>> coefun <- function(df) coef(lm(y ~ mouse), data = df)
>> # Apply to all regions * elem combinations
>> coefs <- ddply(d, .(regions, elem), coefun)
>> names(coefs) <- c('regions', 'elem', 'b0', 'b1')
>>
>> # Generate the plot using package ggplot2:
>> ggplot(d, aes(x = mouse, y = y)) +
>>   geom_point(size = 2.5) +
>>   geom_abline(data = coefs, aes(intercept = b0, slope = b1),
>>                             size = 1) +
>>   facet_grid(elem ~ regions)
>>
>> # Same plot in lattice:
>> library('lattice')
>> library('latticeExtra')
>> p <- xyplot(y ~ mouse | elem + regions, data = d, type = c('p', 'r'),
>>         layout = c(5, 5))
>>
>>
>> HTH,
>> Dennis
>>
>> On Sat, Nov 5, 2011 at 10:49 AM, Kaiyin Zhong <kindlychung at gmail.com> wrote:
>> >> regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain',
>> > 'cerebellum')
>> >> mice = paste('mouse', 1:5, sep='')
>> >> for (n in c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')) {
>> > +   assign(n, as.data.frame(replicate(5, rnorm(5))))
>> > + }
>> >> names(Cu) = names(Zn) = names(Fe) = names(Ca) = names(Enzyme) = regions
>> >> row.names(Cu) = row.names(Zn) = row.names(Fe) = row.names(Ca) =
>> > row.names(Enzyme) = mice
>> >> Cu
>> >           cortex hippocampus brain_stem  mid_brain cerebellum
>> > mouse1 -0.5436573 -0.31486713  0.1039148 -0.3908665 -1.0849112
>> > mouse2  1.4559136  1.75731752 -2.1195118 -0.9894767  0.3609033
>> > mouse3 -0.6735427 -0.04666507  0.9641000  0.4683339  0.7419944
>> > mouse4  0.6926557 -0.47820023  1.3560802  0.9967562 -1.3727874
>> > mouse5  0.2371585  0.20031393 -1.4978517  0.7535148  0.5632443
>> >> Zn
>> >            cortex hippocampus brain_stem  mid_brain  cerebellum
>> > mouse1 -0.66424043   0.6664478  1.1983546  0.0319403  0.41955740
>> > mouse2 -1.14510448   1.5612235  0.3210821  0.4094753  1.01637466
>> > mouse3 -0.85954416   2.8275458 -0.6922565 -0.8182307 -0.06961242
>> > mouse4  0.03606034  -0.7177256  0.7067217  0.2036655 -0.25542524
>> > mouse5  0.67427572   0.6171704  0.1044267 -1.8636174 -0.07654666
>> >> Fe
>> >           cortex hippocampus  brain_stem  mid_brain cerebellum
>> > mouse1  1.8337008   2.0884261  0.29730413 -1.6884804  0.8336137
>> > mouse2 -0.2734139  -0.5728439  0.63791556 -0.6232828 -1.1352224
>> > mouse3 -0.4795082   0.1627235  0.21775206  1.0751584 -0.5581422
>> > mouse4  1.7125147  -0.5830600  1.40597896 -0.2815305  0.3776360
>> > mouse5 -0.3469067  -0.4813120 -0.09606797  1.0970077 -1.1234038
>> >> Ca
>> >           cortex hippocampus  brain_stem   mid_brain cerebellum
>> > mouse1 -0.7663354   0.8595091  1.33803798 -1.17651576  0.8299963
>> > mouse2 -0.7132260  -0.2626811  0.08025079 -2.40924271  0.7883005
>> > mouse3 -0.7988904  -0.1144639 -0.65901136  0.42462227  0.7068755
>> > mouse4  0.3880393   0.5570068 -0.49969135  0.06633009 -1.3497228
>> > mouse5  1.0077684   0.6023264 -0.57387762  0.25919461 -0.9337281
>> >> Enzyme
>> >           cortex hippocampus  brain_stem  mid_brain cerebellum
>> > mouse1  1.3430936   0.5335819 -0.56992947  1.3565803 -0.8323391
>> > mouse2  1.0520850  -1.0201124  0.89600005  1.4719880  1.0854768
>> > mouse3 -0.2802482   0.6863323 -1.37483570 -0.7790174  0.2446761
>> > mouse4 -0.1916415  -0.4566571  1.93365932  1.3493848  0.2130424
>> > mouse5 -1.0349593  -0.1940268 -0.07216321 -0.2968288  1.7406905
>> >
>> > In each anatomic region, I would like to calculate the correlation between
>> > Enzyme activity and each of the concentrations of Cu, Zn, Fe, and Ca, and
>> > do a scatter plot with a tendency line, organizing those plots into a grid.
>> > See the image below for the desired effect:
>> > http://postimage.org/image/62brra6jn/
>> > How can I achieve this?
>> >
>> > Thank you in advance.
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>



More information about the R-help mailing list