[R] Variance of multiple non-contiguous time periods?

David Winsemius dwinsemius at comcast.net
Tue Nov 4 18:42:39 CET 2014


On Nov 4, 2014, at 9:16 AM, CJ Davies wrote:

> On 04/11/14 17:02, David Winsemius wrote:
>> 
>> On Nov 4, 2014, at 8:35 AM, CJ Davies wrote:
>> 
>>> On 04/11/14 16:13, PIKAL Petr wrote:
>>>> Hi
>>>> 
>>>>> -----Original Message-----
>>>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>>>> project.org] On Behalf Of CJ Davies
>>>>> Sent: Tuesday, November 04, 2014 2:50 PM
>>>>> To: Jim Lemon; r-help at r-project.org
>>>>> Subject: Re: [R] Variance of multiple non-contiguous time periods?
>>>>> 
>>>>> On 04/11/14 09:11, Jim Lemon wrote:
>>>>>> On Mon, 3 Nov 2014 12:45:03 PM CJ Davies wrote:
>>>>>>> ...
>>>>>>> On 30/10/14 21:33, Jim Lemon wrote:
>>>>>>> If I understand, you mean to calculate deviations for each
>>>>> individual
>>>>>>> 'chunk' of each transition & then aggregate the results? This is
>>>>> what
>>>>>>> I'd been thinking about, but is there a sensible manner within R to
>>>>>>> achieve this, or is it something for which it would be easier to
>>>>>>> preprocess the data in an external tool? Is there some way to subset
>>>>>> the
>>>>>>> data such that I can work over just contiguous 'chunks'?
>>>>>>> 
>>>>>> Exactly. If there is some combination of existing variables that can
>>>>>> be combined to make a set of unique values for each "chunk", you can
>>>>>> calculate the deviations within each "chunk", then average the
>>>>> squared
>>>>>> deviations for each type of "chunk", weighting by the duration of the
>>>>>> "chunks" so that you don't bias the pooled variance toward the longer
>>>>>> "chunks".
>>>>>> 
>>>>>> Jim
>>>>>> 
>>>>> 
>>>>> I am stumped for a way of automating this process though. Each line of
>>>>> log data looks like this;
>>>>> 
>>>>> 2406  55.4    (-11.2, 1.0, -0.9)      (-4.1, 1.0, 0.0)        7.077912
>>>>>       0.9203392       (0.0,
>>>>> 0.7, -0.1, 0.7)       8.129684        89.41537        -8.212769       (0.0, 0.7, -0.1,
>>>>> 0.7)
>>>>> 8.129684      89.41537        351.7872        1       0       0       False   0.15    3
>>>>>       37.76761        True    False   0
>>>>> transition 1
>>>> 
>>>> First you need to import it to R which could be tricky based on above line.
>>>> Some values will probably need to process through regular expression.
>>>> 
>>>> If I understand correctly number after transition is a signal which estimets continuous chunks. If it is true then
>>>> 
>>>> ?rle is a function which can estimate length of chunks.
>>>> 
>>>> Cheers
>>>> Petr
>>>> 
>>>>> 
>>>>> Where the last variable defines which transition is currently active.
>>>>> However to separate these data into 'chunks' would involve making a
>>>>> comparison between each line of data & the preceding line of data to
>>>>> determine whether it is part of the same contiguous 'chunk'. Is this
>>>>> something that would be better achieved using external preprocessing
>>>>> written in a language I am more familiar with, as I haven't the
>>>>> foggiest how I would approach this within R?
>>>>> 
>>>>> Regards,
>>>>> CJ Davies
>>>>> 
>>>>> ______________________________________________
>>>> 
>> snipped
>>>> 
>>> 
>>> Importing into R wasn't an issue; some of the fields contain spaces & symbols, but all the fields are tab separated so I can simply use;
>>> 
>>> foo <- read.csv("bar",header=T,sep="\t")
>>> 
>>> I've just written a hacky bit of Java that gives me the lines of each 'chunk' as a separate list & I think I'll then calculate these particular values using Java's Math class rather than trying to come up with a sensible way to import these 'chunks' back into R. When it comes to string/list manipulation like this I think my knowledge in Java & lack of knowledge in R makes the former the better option!
>>> 
>> 
>> If you had offered the output of dput(head(foo, 20) ) and explained what defined a "chunk-defining transition", it would have been fairly easy to show you how to use cumsum in an ave() call to construct a grouping variable.
>> 
>> 
>>> Regards,
>>> CJ Davies
>>> 
>>> ______________________________
>> 
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 
> 
> Here is an example 100 lines of the input --> http://paste2.org/2LZVGP5K
> 
> The final value on each line, under the header "environment", is always one of ["real", "transition 1", "transition 2", "transition 3", "transition 4"]. A 'chunk-defining transition' is when this value changes.
> 
> If there is a way to do this in R in a more elegant fashion than my hacky Java, then I would be glad to learn.

That pasted material does not appear to preserve the tabs. Input with your suggested code "does not work" in the sense that it brings in an object like this. 

> download.file("http://paste2.org/2LZVGP5K", "bar.txt")
trying URL 'http://paste2.org/2LZVGP5K'
Content type 'text/html; charset=UTF-8' length unknown
opened URL
.......... .......... ........
downloaded 28 Kb

> foo <- read.csv("bar.txt",header=T,sep="\t")
> str(foo)
'data.frame':	2829 obs. of  1 variable:
 $ X..DOCTYPE.html.: Factor w/ 669 levels "","          ",..: 106 104 219 233 220 222 221 215 217 79 ...

I SAY AGAIN:

Need ; output of dput(head(foo, 100) )


> 
> Regards,
> CJ Davies

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list