[R] Variance of multiple non-contiguous time periods?

CJ Davies cjohndavies at gmail.com
Tue Nov 4 18:16:45 CET 2014


On 04/11/14 17:02, David Winsemius wrote:
>
> On Nov 4, 2014, at 8:35 AM, CJ Davies wrote:
>
>> On 04/11/14 16:13, PIKAL Petr wrote:
>>> Hi
>>>
>>>> -----Original Message-----
>>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>>> project.org] On Behalf Of CJ Davies
>>>> Sent: Tuesday, November 04, 2014 2:50 PM
>>>> To: Jim Lemon; r-help at r-project.org
>>>> Subject: Re: [R] Variance of multiple non-contiguous time periods?
>>>>
>>>> On 04/11/14 09:11, Jim Lemon wrote:
>>>>> On Mon, 3 Nov 2014 12:45:03 PM CJ Davies wrote:
>>>>>> ...
>>>>>> On 30/10/14 21:33, Jim Lemon wrote:
>>>>>> If I understand, you mean to calculate deviations for each
>>>> individual
>>>>>> 'chunk' of each transition & then aggregate the results? This is
>>>> what
>>>>>> I'd been thinking about, but is there a sensible manner within R to
>>>>>> achieve this, or is it something for which it would be easier to
>>>>>> preprocess the data in an external tool? Is there some way to subset
>>>>> the
>>>>>> data such that I can work over just contiguous 'chunks'?
>>>>>>
>>>>> Exactly. If there is some combination of existing variables that can
>>>>> be combined to make a set of unique values for each "chunk", you can
>>>>> calculate the deviations within each "chunk", then average the
>>>> squared
>>>>> deviations for each type of "chunk", weighting by the duration of the
>>>>> "chunks" so that you don't bias the pooled variance toward the longer
>>>>> "chunks".
>>>>>
>>>>> Jim
>>>>>
>>>>
>>>> I am stumped for a way of automating this process though. Each line of
>>>> log data looks like this;
>>>>
>>>> 2406  55.4    (-11.2, 1.0, -0.9)      (-4.1, 1.0, 0.0)        7.077912
>>>>        0.9203392       (0.0,
>>>> 0.7, -0.1, 0.7)       8.129684        89.41537        -8.212769       (0.0, 0.7, -0.1,
>>>> 0.7)
>>>> 8.129684      89.41537        351.7872        1       0       0       False   0.15    3
>>>>        37.76761        True    False   0
>>>> transition 1
>>>
>>> First you need to import it to R which could be tricky based on above line.
>>> Some values will probably need to process through regular expression.
>>>
>>> If I understand correctly number after transition is a signal which estimets continuous chunks. If it is true then
>>>
>>> ?rle is a function which can estimate length of chunks.
>>>
>>> Cheers
>>> Petr
>>>
>>>>
>>>> Where the last variable defines which transition is currently active.
>>>> However to separate these data into 'chunks' would involve making a
>>>> comparison between each line of data & the preceding line of data to
>>>> determine whether it is part of the same contiguous 'chunk'. Is this
>>>> something that would be better achieved using external preprocessing
>>>> written in a language I am more familiar with, as I haven't the
>>>> foggiest how I would approach this within R?
>>>>
>>>> Regards,
>>>> CJ Davies
>>>>
>>>> ______________________________________________
>>>
> snipped
>>>
>>
>> Importing into R wasn't an issue; some of the fields contain spaces & symbols, but all the fields are tab separated so I can simply use;
>>
>> foo <- read.csv("bar",header=T,sep="\t")
>>
>> I've just written a hacky bit of Java that gives me the lines of each 'chunk' as a separate list & I think I'll then calculate these particular values using Java's Math class rather than trying to come up with a sensible way to import these 'chunks' back into R. When it comes to string/list manipulation like this I think my knowledge in Java & lack of knowledge in R makes the former the better option!
>>
>
> If you had offered the output of dput(head(foo, 20) ) and explained what defined a "chunk-defining transition", it would have been fairly easy to show you how to use cumsum in an ave() call to construct a grouping variable.
>
>
>> Regards,
>> CJ Davies
>>
>> ______________________________
>
>
> David Winsemius
> Alameda, CA, USA
>

Here is an example 100 lines of the input --> http://paste2.org/2LZVGP5K

The final value on each line, under the header "environment", is always 
one of ["real", "transition 1", "transition 2", "transition 3", 
"transition 4"]. A 'chunk-defining transition' is when this value changes.

If there is a way to do this in R in a more elegant fashion than my 
hacky Java, then I would be glad to learn.

Regards,
CJ Davies



More information about the R-help mailing list