[R] acf() with two df?

Thu May 1 21:15:32 CEST 2003

By way of clarifying an earlier request to this list,
on Wed, 30 Apr 2003, Martin Wegmann wrote:

> I have two kind of datasets, 1. environmental variables
> (several data frames) and  2. one data frame with the
> depending data (zoological data).  All of them are sampled
> on 200 plots, therefore the dataframes correspond column by
> column (not row by row).
>
> variable 1
>          plot 1 plot 2.....
> 1999   ....    ....
> 2000
> 2001
> ...
>
> variable 2
> ....
>
> and another data frame with the diversity and abundance etc. of
> animals for each plot:
>
>         plot 1 plot 2 ...
> div.     ...    ....
> abu.
> ..
>
> and I would like to know the p value of the whole animal data frame
> (not separated by plots [columns]) to each variable data frame.
> I thought that cancor() could do the job but that requires two df
> with matching rows and columns.

Martin  -

My present understanding of the data set is this:  for
each of 200 geographic sampling locations there is a short
time series of historical weather data on several variables
- I'll guess annual rainfall, minimum temperature, maximum
temperature, number of days of sunlight per year, etc. -
and there are many measured outcome variables from a field
study of the diversity and abundance of animal species on
that plot.

Given this understanding, I would regard this as primarily
a regression or correlation problem:  How are the weather
data related to the animal data ?   or   How well can we
predict the animal diversity (say) on each plot from the
climate characteristics of that plot ?  I think the time
series aspects of the climate data are secondary.

I would begin any data analysis by looking at scatterplots
of all the measured outcome data.  See function  pairs().
This function expects a matrix or data frame in which each
plot is one row, and each outcome variable is one column.
This is the transpose of the data frame you described, so
(if "animals" is the name of the data frame described above)
a command like

pairs(t(as.matrix(animals)))

should do the job.  These scatterplots are much easier to
look at than the corresponding correlation matrix.  This
will give an overview of the patterns present in the data
and maybe it will suggest appropriate further analyses.

When it's time to include the weather data, I would initially
do the following.  For each weather variable at each geographic
location, summarize the historical record by four characteristics:
its mean, trend, quadratic term and residual standard deviation.

Then examine scatterplots of the means of all weather variables
versus animal variables, or the trend of all weather variables
versus animal variables, etc.  This will treat the entire data
set as essentially a regression problem.

I know that I am taking a BIG risk by offering data analysis
suggestions in a public forum !  Other people will have quite
different preferences and strongly held opinions about them.
So be it.

-  tom blackwell  -  u michigan medical school  -  ann arbor  -