[R] xyplot() or splom()?: two factors from same data frame

Rich Shepard rshepard at appl-ecosys.com
Fri Oct 21 15:47:43 CEST 2011


On Fri, 21 Oct 2011, Duncan Mackay wrote:

> Without a dataset I am not sure what you need.

Duncan,

   Part of the problems I'm trying to resolve come from changing priorities
from my client and the regulators. I end up stopping one process and
starting on a different one. But, that's life in the real world of
environmental consulting. :-)

   What I need now is to compare TDS (total dissolved solids) with specific
conductivity and the ions that are normally comprise TDS. Before running any
regression models I need to look at these data from three points of view:
all data from all sites collected during the past 30 years; average (or
total) concentrations (not yet decided on what makes the most ecological
sense) within a stream having multiple collection sites; and by site within
certain streams.

   I think that I need to subset the data frame to create distinct analytical
data frames for each comparison, then rm() them until needed again (or I'd
have a very large number of files in the directory). If I have a subset, for
example, of TDS and conductivity regardless of sample date or location I
will have two columns of numbers that will fit the xyplot() formula; e.g.,
xyplot(TDS ~ Cond). This is the broad picture. I can then use the
hydrographic basins (2 of 'em) or streams (24 of 'em) as factors to
condition the analysis. Repeat for other parameter pairs (TDS vs. Ca, TDS
vs, Mg, etc.).

   Another part of the issue, perhaps, is that the data are in a single data
frame:

  str(chemdata)
'data.frame':	47244 obs. of  6 variables:
  $ site    : Factor w/ 143 levels "BC-0.5","BC-1",..: 134 134 134 127 127
  $ sampdate: Date, format: "2006-12-06" "2006-12-06" ...
  $ param   : Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 58 66 12 24 59 66
  $ quant   : num  1.08e+04 7.95 1.80e-02 2.80e+02 1.90e+01 8.44 1.62e+03
  $ stream  : Factor w/ 24 levels "BCrk","CCrk",..: 4 4 4 21 21 21 4
  $ basin   : Factor w/ 2 levels "BasinEast","BasinWest": 1 1 1 1 1 1 1 1 1 2 ...

while all the data sets used in the books I've read are simpler. What I've
not read is guidance on how complex data sets could (or should) be
partitioned into smaller but still related data sets to facilitate analyses.

   I hope this clarifies my initial request.

Rich



More information about the R-help mailing list