[R] Working With Variables Having Different Lengths

Sat Oct 22 04:05:20 CEST 2011

On Oct 21, 2011, at 8:14 PM, Rich Shepard wrote:

> On Fri, 21 Oct 2011, David Winsemius wrote:
>
>> The only variable in that dataframe with what appears to be a  
>> continuous
>> value (which is how I would expect "total dissolved solids" to be
>> measured) is "quant" Are you saying that the value of quant is  
>> measuring
>> something with different units depending on the value of 'param'  
>> and that
>> 'site' and 'date' shoud be used to identify associated  
>> measurements? This
>> would appear to be the case based on what you are saying below.
>
> David,
>
>  'Quant' is the measured concentration of the different chemicals
> identified in 'param'. I want to plot (and model) the quant values
> associated with 'TDS' and other chemicals, preferably from samples  
> at the
> same location and date. Units are mg/L except for pH (standard  
> units) and
> specific conductance (microSiemens/cm).
>
>  What I'm not understanding is how to specify the 'quant' values for  
> the
> params 'TDS' and 'Cond' (for example) for an xyplot() or lm().
>
>> If this is so the problem is to break apart the dataframe by type  
>> of measurement ('param') butone way would be to split into separate  
>> dataframes then merge back together by an appropriate linkage on  
>> site and date. I'm guessing that 'stream' and 'basin' are  
>> superfluous for the matching and can be later associated with 'site'?
>
>  Yes, stream and basin are supersets of site. I used subset() to  
> create
> separate dataframes from a set I called 'streamdata' (which  
> aggregated the
> sites in an individual stream into one), but I'm not satisfied with  
> how I
> did that and would rather learn to work with the overall 'chemdata'  
> set.
>
>> The goal would be a dataframe with 7 renamed 'param' columns  
>> ('TDS', 'Cond', 'Mg', 'SO4', 'Cl', 'Na', and 'Ca') and two  
>> identifier columns ('site' and 'sampdate'. For the moment I would  
>> think you would want all the data together an not make any  
>> decisions about excluding NA values until you get an overall  
>> picture of the situation.
>
>  I agree that's what I want.
>
>> The first thing I would try would be
>>
>> with(subset(chemdata, param %in% c('TDS', 'Cond', 'Mg', 'SO4',  
>> 'Cl', 'Na', and 'Ca') , 1:4) ,
>>   xtabs(quant ~ site + sampdate + param) )
>>
>> You would get 7 tables One for each 'param' with up to 143 rows and  
>> as many columns as you have sampdates.
>>
>> This might be a good use for package reshape2 since it generally  
>> returns a dataframe. The above operation would return an array with  
>> 3 dimensions. You might get immediate success with something like:
>
>> dcast( subset(chemdata, param %in% c('TDS', 'Cond', 'Mg', 'SO4',  
>> 'Cl', 'Na', and 'Ca') , 1:4) ,
>>   site + sampdate ~ param)
>> # the omitted varialble name should ent up in the values columns
>
>> To do your testing it might be wise to apply more selective use of  
>> subset. Perhaps on;u go for a few sites and dates.
>
>  OK. I need to read and increase my understanding of with() and learn
> dcast(). May not get to all this over the weekend, but I'll be back  
> with
> results.

`with` is not what's doing the work. I just use `with` to simplify the  
code. It is like a local version of `attach`. Within the  
"parenthetical' enclosure of the `with` function you can refer to the  
(unquoted) column names as objects. I could have referred to them with  
chemdata[["param"]] instead of with(chemdata,  .....  param ...)

These are all equivalent:

with(chemdata, table(site, param))

table(chemdata$site, chemdata$param)

table(chemdata[["site"]], chemdata[["site"]])

-- 
David.
>
> Thanks very much,
>
> Rich
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT