[R] Working With Variables Having Different Lengths
David Winsemius
dwinsemius at comcast.net
Sat Oct 22 04:05:20 CEST 2011
On Oct 21, 2011, at 8:14 PM, Rich Shepard wrote:
> On Fri, 21 Oct 2011, David Winsemius wrote:
>
>> The only variable in that dataframe with what appears to be a
>> continuous
>> value (which is how I would expect "total dissolved solids" to be
>> measured) is "quant" Are you saying that the value of quant is
>> measuring
>> something with different units depending on the value of 'param'
>> and that
>> 'site' and 'date' shoud be used to identify associated
>> measurements? This
>> would appear to be the case based on what you are saying below.
>
> David,
>
> 'Quant' is the measured concentration of the different chemicals
> identified in 'param'. I want to plot (and model) the quant values
> associated with 'TDS' and other chemicals, preferably from samples
> at the
> same location and date. Units are mg/L except for pH (standard
> units) and
> specific conductance (microSiemens/cm).
>
> What I'm not understanding is how to specify the 'quant' values for
> the
> params 'TDS' and 'Cond' (for example) for an xyplot() or lm().
>
>> If this is so the problem is to break apart the dataframe by type
>> of measurement ('param') butone way would be to split into separate
>> dataframes then merge back together by an appropriate linkage on
>> site and date. I'm guessing that 'stream' and 'basin' are
>> superfluous for the matching and can be later associated with 'site'?
>
> Yes, stream and basin are supersets of site. I used subset() to
> create
> separate dataframes from a set I called 'streamdata' (which
> aggregated the
> sites in an individual stream into one), but I'm not satisfied with
> how I
> did that and would rather learn to work with the overall 'chemdata'
> set.
>
>> The goal would be a dataframe with 7 renamed 'param' columns
>> ('TDS', 'Cond', 'Mg', 'SO4', 'Cl', 'Na', and 'Ca') and two
>> identifier columns ('site' and 'sampdate'. For the moment I would
>> think you would want all the data together an not make any
>> decisions about excluding NA values until you get an overall
>> picture of the situation.
>
> I agree that's what I want.
>
>> The first thing I would try would be
>>
>> with(subset(chemdata, param %in% c('TDS', 'Cond', 'Mg', 'SO4',
>> 'Cl', 'Na', and 'Ca') , 1:4) ,
>> xtabs(quant ~ site + sampdate + param) )
>>
>> You would get 7 tables One for each 'param' with up to 143 rows and
>> as many columns as you have sampdates.
>>
>> This might be a good use for package reshape2 since it generally
>> returns a dataframe. The above operation would return an array with
>> 3 dimensions. You might get immediate success with something like:
>
>> dcast( subset(chemdata, param %in% c('TDS', 'Cond', 'Mg', 'SO4',
>> 'Cl', 'Na', and 'Ca') , 1:4) ,
>> site + sampdate ~ param)
>> # the omitted varialble name should ent up in the values columns
>
>> To do your testing it might be wise to apply more selective use of
>> subset. Perhaps on;u go for a few sites and dates.
>
> OK. I need to read and increase my understanding of with() and learn
> dcast(). May not get to all this over the weekend, but I'll be back
> with
> results.
`with` is not what's doing the work. I just use `with` to simplify the
code. It is like a local version of `attach`. Within the
"parenthetical' enclosure of the `with` function you can refer to the
(unquoted) column names as objects. I could have referred to them with
chemdata[["param"]] instead of with(chemdata, ..... param ...)
These are all equivalent:
with(chemdata, table(site, param))
table(chemdata$site, chemdata$param)
table(chemdata[["site"]], chemdata[["site"]])
--
David.
>
> Thanks very much,
>
> Rich
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list