[R] Working With Variables Having Different Lengths

Rich Shepard rshepard at appl-ecosys.com
Fri Oct 21 21:02:19 CEST 2011


On Fri, 21 Oct 2011, David Winsemius wrote:

> First you need to clarify whether "TDS" is the name of a column or a
> possible value in a column named "param". This whole painful
> multi-question process would be greatly accelerated if you offered
> str(chemdata).

   Yes, I did on a different thread, but not on this one.

str(chemdata)
'data.frame':	47244 obs. of  6 variables:
  $ site    : Factor w/ 143 levels "BC-0.5","BC-1",..: 134 134 134 127 127
  $ sampdate: Date, format: "2006-12-06" "2006-12-06" ...
  $ param   : Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 58 66 12 24 59 66
  $ quant   : num  1.08e+04 7.95 1.80e-02 2.80e+02 1.90e+01 8.44 1.62e+03
  $ stream  : Factor w/ 24 levels "B","C",..: 4 4 4 21 21 21 4
  $ basin   : Factor w/ 2 levels "Basin1","Basin2": 1 1 1 1 1 1 1 1 1 2 ...

   What I need to do is examine the relationships between the parameter "TDS"
and other parameters associated with it; e.g., "Cond" and "SO4". I started
by subsetting the main data frame (chemdata)

tds.basin <- subset(chemdata, param == "TDS", select = c(param, quant, \
basin), na.rm = TRUE, drop = TRUE)

cond.basin <- subset(chemdata, param == "Cond", select = c(param, quant, \
basin), na.rm = TRUE, drop = TRUE)

However, these left the NA rows in the new data frames.

   I can produce an xyplot() using tds.basin$quant and cond.basin$quant, but
it's obvious there are many points where one or the other have NA values.
When I tried a linear regression it failed because of an unequal number of
rows in both data frames.

   What I need to learn are: 1) how to write the subset() to remove the NA
rows for each one and 2) how to perform linear regression (and further
analyses) on these pairs of data frames.

> If you do not offer both the code and the verbatim copy of the error there
> will be very little that we can do to diagnose your problem.

str(tds.basin)
'data.frame':	2206 obs. of  3 variables:
  $ param: Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 58 58 58 58 58 58 58
  $ quant: num  10800 530 3838 3658 3756 ...
  $ basin: Factor w/ 2 levels "Basin1","Basin2": 1 2 2 2 2 2 2 2 2 2 ...

str(cond.basin)
'data.frame':	1191 obs. of  3 variables:
  $ param: Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 24 24 24 24 24 24 24
  $ quant: num  280 3170 4220 3420 3700 ...
  $ basin: Factor w/ 2 levels "Basin1","Basin2": 1 2 2 2 2 2 2 2 2 2 ...

then,

  m1 <- lm(tds.basin$quant ~ cond.basin$quant)
Error in model.frame.default(formula = tds.basin$quant ~ cond.basin$quant,
:
   variable lengths differ (found for 'cond.basin$quant')

Rich



More information about the R-help mailing list