[R] Working With Variables Having Different Lengths

Rich Shepard rshepard at appl-ecosys.com
Sat Oct 22 02:14:48 CEST 2011


On Fri, 21 Oct 2011, David Winsemius wrote:

> The only variable in that dataframe with what appears to be a continuous
> value (which is how I would expect "total dissolved solids" to be
> measured) is "quant" Are you saying that the value of quant is measuring
> something with different units depending on the value of 'param' and that
> 'site' and 'date' shoud be used to identify associated measurements? This
> would appear to be the case based on what you are saying below.

David,

   'Quant' is the measured concentration of the different chemicals
identified in 'param'. I want to plot (and model) the quant values
associated with 'TDS' and other chemicals, preferably from samples at the
same location and date. Units are mg/L except for pH (standard units) and
specific conductance (microSiemens/cm).

   What I'm not understanding is how to specify the 'quant' values for the
params 'TDS' and 'Cond' (for example) for an xyplot() or lm().

> If this is so the problem is to break apart the dataframe by type of 
> measurement ('param') butone way would be to split into separate dataframes 
> then merge back together by an appropriate linkage on site and date. I'm 
> guessing that 'stream' and 'basin' are superfluous for the matching and can 
> be later associated with 'site'?

   Yes, stream and basin are supersets of site. I used subset() to create
separate dataframes from a set I called 'streamdata' (which aggregated the
sites in an individual stream into one), but I'm not satisfied with how I
did that and would rather learn to work with the overall 'chemdata' set.

> The goal would be a dataframe with 7 renamed 'param' columns ('TDS', 'Cond', 
> 'Mg', 'SO4', 'Cl', 'Na', and 'Ca') and two identifier columns ('site' and 
> 'sampdate'. For the moment I would think you would want all the data together 
> an not make any decisions about excluding NA values until you get an overall 
> picture of the situation.

   I agree that's what I want.

> The first thing I would try would be
>
> with(subset(chemdata, param %in% c('TDS', 'Cond', 'Mg', 'SO4', 'Cl', 'Na', 
> and 'Ca') , 1:4) ,
>    xtabs(quant ~ site + sampdate + param) )
>
> You would get 7 tables One for each 'param' with up to 143 rows and as many 
> columns as you have sampdates.
>
> This might be a good use for package reshape2 since it generally returns a 
> dataframe. The above operation would return an array with 3 dimensions. You 
> might get immediate success with something like:

> dcast( subset(chemdata, param %in% c('TDS', 'Cond', 'Mg', 'SO4', 'Cl', 'Na', 
> and 'Ca') , 1:4) ,
>    site + sampdate ~ param)
> # the omitted varialble name should ent up in the values columns

> To do your testing it might be wise to apply more selective use of subset. 
> Perhaps on;u go for a few sites and dates.

   OK. I need to read and increase my understanding of with() and learn
dcast(). May not get to all this over the weekend, but I'll be back with
results.

Thanks very much,

Rich



More information about the R-help mailing list