[R] Working With Variables Having Different Lengths
Rich Shepard
rshepard at appl-ecosys.com
Fri Oct 21 21:02:19 CEST 2011
On Fri, 21 Oct 2011, David Winsemius wrote:
> First you need to clarify whether "TDS" is the name of a column or a
> possible value in a column named "param". This whole painful
> multi-question process would be greatly accelerated if you offered
> str(chemdata).
Yes, I did on a different thread, but not on this one.
str(chemdata)
'data.frame': 47244 obs. of 6 variables:
$ site : Factor w/ 143 levels "BC-0.5","BC-1",..: 134 134 134 127 127
$ sampdate: Date, format: "2006-12-06" "2006-12-06" ...
$ param : Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 58 66 12 24 59 66
$ quant : num 1.08e+04 7.95 1.80e-02 2.80e+02 1.90e+01 8.44 1.62e+03
$ stream : Factor w/ 24 levels "B","C",..: 4 4 4 21 21 21 4
$ basin : Factor w/ 2 levels "Basin1","Basin2": 1 1 1 1 1 1 1 1 1 2 ...
What I need to do is examine the relationships between the parameter "TDS"
and other parameters associated with it; e.g., "Cond" and "SO4". I started
by subsetting the main data frame (chemdata)
tds.basin <- subset(chemdata, param == "TDS", select = c(param, quant, \
basin), na.rm = TRUE, drop = TRUE)
cond.basin <- subset(chemdata, param == "Cond", select = c(param, quant, \
basin), na.rm = TRUE, drop = TRUE)
However, these left the NA rows in the new data frames.
I can produce an xyplot() using tds.basin$quant and cond.basin$quant, but
it's obvious there are many points where one or the other have NA values.
When I tried a linear regression it failed because of an unequal number of
rows in both data frames.
What I need to learn are: 1) how to write the subset() to remove the NA
rows for each one and 2) how to perform linear regression (and further
analyses) on these pairs of data frames.
> If you do not offer both the code and the verbatim copy of the error there
> will be very little that we can do to diagnose your problem.
str(tds.basin)
'data.frame': 2206 obs. of 3 variables:
$ param: Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 58 58 58 58 58 58 58
$ quant: num 10800 530 3838 3658 3756 ...
$ basin: Factor w/ 2 levels "Basin1","Basin2": 1 2 2 2 2 2 2 2 2 2 ...
str(cond.basin)
'data.frame': 1191 obs. of 3 variables:
$ param: Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 24 24 24 24 24 24 24
$ quant: num 280 3170 4220 3420 3700 ...
$ basin: Factor w/ 2 levels "Basin1","Basin2": 1 2 2 2 2 2 2 2 2 2 ...
then,
m1 <- lm(tds.basin$quant ~ cond.basin$quant)
Error in model.frame.default(formula = tds.basin$quant ~ cond.basin$quant,
:
variable lengths differ (found for 'cond.basin$quant')
Rich
More information about the R-help
mailing list