[R] Variance of multiple non-contiguous time periods?

Wed Nov 5 00:41:40 CET 2014

On 04/11/14 17:42, David Winsemius wrote:
> On Nov 4, 2014, at 9:16 AM, CJ Davies wrote:
>
>> On 04/11/14 17:02, David Winsemius wrote:
>>> On Nov 4, 2014, at 8:35 AM, CJ Davies wrote:
>>>
>>>> On 04/11/14 16:13, PIKAL Petr wrote:
>>>>> Hi
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>>>>> project.org] On Behalf Of CJ Davies
>>>>>> Sent: Tuesday, November 04, 2014 2:50 PM
>>>>>> To: Jim Lemon; r-help at r-project.org
>>>>>> Subject: Re: [R] Variance of multiple non-contiguous time periods?
>>>>>>
>>>>>> On 04/11/14 09:11, Jim Lemon wrote:
>>>>>>> On Mon, 3 Nov 2014 12:45:03 PM CJ Davies wrote:
>>>>>>>> ...
>>>>>>>> On 30/10/14 21:33, Jim Lemon wrote:
>>>>>>>> If I understand, you mean to calculate deviations for each
>>>>>> individual
>>>>>>>> 'chunk' of each transition & then aggregate the results? This is
>>>>>> what
>>>>>>>> I'd been thinking about, but is there a sensible manner within R to
>>>>>>>> achieve this, or is it something for which it would be easier to
>>>>>>>> preprocess the data in an external tool? Is there some way to subset
>>>>>>> the
>>>>>>>> data such that I can work over just contiguous 'chunks'?
>>>>>>>>
>>>>>>> Exactly. If there is some combination of existing variables that can
>>>>>>> be combined to make a set of unique values for each "chunk", you can
>>>>>>> calculate the deviations within each "chunk", then average the
>>>>>> squared
>>>>>>> deviations for each type of "chunk", weighting by the duration of the
>>>>>>> "chunks" so that you don't bias the pooled variance toward the longer
>>>>>>> "chunks".
>>>>>>>
>>>>>>> Jim
>>>>>>>
>>>>>> I am stumped for a way of automating this process though. Each line of
>>>>>> log data looks like this;
>>>>>>
>>>>>> 2406  55.4    (-11.2, 1.0, -0.9)      (-4.1, 1.0, 0.0)        7.077912
>>>>>>       0.9203392       (0.0,
>>>>>> 0.7, -0.1, 0.7)       8.129684        89.41537        -8.212769       (0.0, 0.7, -0.1,
>>>>>> 0.7)
>>>>>> 8.129684      89.41537        351.7872        1       0       0       False   0.15    3
>>>>>>       37.76761        True    False   0
>>>>>> transition 1
>>>>> First you need to import it to R which could be tricky based on above line.
>>>>> Some values will probably need to process through regular expression.
>>>>>
>>>>> If I understand correctly number after transition is a signal which estimets continuous chunks. If it is true then
>>>>>
>>>>> ?rle is a function which can estimate length of chunks.
>>>>>
>>>>> Cheers
>>>>> Petr
>>>>>
>>>>>> Where the last variable defines which transition is currently active.
>>>>>> However to separate these data into 'chunks' would involve making a
>>>>>> comparison between each line of data & the preceding line of data to
>>>>>> determine whether it is part of the same contiguous 'chunk'. Is this
>>>>>> something that would be better achieved using external preprocessing
>>>>>> written in a language I am more familiar with, as I haven't the
>>>>>> foggiest how I would approach this within R?
>>>>>>
>>>>>> Regards,
>>>>>> CJ Davies
>>>>>>
>>>>>> ______________________________________________
>>> snipped
>>>> Importing into R wasn't an issue; some of the fields contain spaces & symbols, but all the fields are tab separated so I can simply use;
>>>>
>>>> foo <- read.csv("bar",header=T,sep="\t")
>>>>
>>>> I've just written a hacky bit of Java that gives me the lines of each 'chunk' as a separate list & I think I'll then calculate these particular values using Java's Math class rather than trying to come up with a sensible way to import these 'chunks' back into R. When it comes to string/list manipulation like this I think my knowledge in Java & lack of knowledge in R makes the former the better option!
>>>>
>>> If you had offered the output of dput(head(foo, 20) ) and explained what defined a "chunk-defining transition", it would have been fairly easy to show you how to use cumsum in an ave() call to construct a grouping variable.
>>>
>>>
>>>> Regards,
>>>> CJ Davies
>>>>
>>>> ______________________________
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>> Here is an example 100 lines of the input --> http://paste2.org/2LZVGP5K
>>
>> The final value on each line, under the header "environment", is always one of ["real", "transition 1", "transition 2", "transition 3", "transition 4"]. A 'chunk-defining transition' is when this value changes.
>>
>> If there is a way to do this in R in a more elegant fashion than my hacky Java, then I would be glad to learn.
> That pasted material does not appear to preserve the tabs. Input with your suggested code "does not work" in the sense that it brings in an object like this. 
>
>> download.file("http://paste2.org/2LZVGP5K", "bar.txt")
> trying URL 'http://paste2.org/2LZVGP5K'
> Content type 'text/html; charset=UTF-8' length unknown
> opened URL
> .......... .......... ........
> downloaded 28 Kb
>
>> foo <- read.csv("bar.txt",header=T,sep="\t")
>> str(foo)
> 'data.frame':	2829 obs. of  1 variable:
>  $ X..DOCTYPE.html.: Factor w/ 669 levels "","          ",..: 106 104 219 233 220 222 221 215 217 79 ...
>
> I SAY AGAIN:
>
> Need ; output of dput(head(foo, 100) )
>
>
>> Regards,
>> CJ Davies
> David Winsemius
> Alameda, CA, USA
>
That was a pastebin URI, so what you downloaded was HTML instead of raw
text. This is the raw text;

http://cjdavies.org/foo

Regards,
CJ Davies