[R] Variance of multiple non-contiguous time periods?
David Winsemius
dwinsemius at comcast.net
Wed Nov 5 01:29:15 CET 2014
On Nov 4, 2014, at 3:41 PM, CJ Davies wrote:
> On 04/11/14 17:42, David Winsemius wrote:
>> On Nov 4, 2014, at 9:16 AM, CJ Davies wrote:
>>
>>> On 04/11/14 17:02, David Winsemius wrote:
>>>> On Nov 4, 2014, at 8:35 AM, CJ Davies wrote:
>>>>
>>>>> On 04/11/14 16:13, PIKAL Petr wrote:
>>>>>> Hi
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>>>>>> project.org] On Behalf Of CJ Davies
>>>>>>> Sent: Tuesday, November 04, 2014 2:50 PM
>>>>>>> To: Jim Lemon; r-help at r-project.org
>>>>>>> Subject: Re: [R] Variance of multiple non-contiguous time periods?
>>>>>>>
>>>>>>> On 04/11/14 09:11, Jim Lemon wrote:
>>>>>>>> On Mon, 3 Nov 2014 12:45:03 PM CJ Davies wrote:
>>>>>>>>> ...
>>>>>>>>> On 30/10/14 21:33, Jim Lemon wrote:
>>>>>>>>> If I understand, you mean to calculate deviations for each
>>>>>>> individual
>>>>>>>>> 'chunk' of each transition & then aggregate the results? This is
>>>>>>> what
>>>>>>>>> I'd been thinking about, but is there a sensible manner within R to
>>>>>>>>> achieve this, or is it something for which it would be easier to
>>>>>>>>> preprocess the data in an external tool? Is there some way to subset
>>>>>>>> the
>>>>>>>>> data such that I can work over just contiguous 'chunks'?
>>>>>>>>>
>>>>>>>> Exactly. If there is some combination of existing variables that can
>>>>>>>> be combined to make a set of unique values for each "chunk", you can
>>>>>>>> calculate the deviations within each "chunk", then average the
>>>>>>> squared
>>>>>>>> deviations for each type of "chunk", weighting by the duration of the
>>>>>>>> "chunks" so that you don't bias the pooled variance toward the longer
>>>>>>>> "chunks".
>>>>>>>>
>>>>>>>> Jim
>>>>>>>>
>>>>>>> I am stumped for a way of automating this process though. Each line of
>>>>>>> log data looks like this;
>>>>>>>
>>>>>>> 2406 55.4 (-11.2, 1.0, -0.9) (-4.1, 1.0, 0.0) 7.077912
>>>>>>> 0.9203392 (0.0,
>>>>>>> 0.7, -0.1, 0.7) 8.129684 89.41537 -8.212769 (0.0, 0.7, -0.1,
>>>>>>> 0.7)
>>>>>>> 8.129684 89.41537 351.7872 1 0 0 False 0.15 3
>>>>>>> 37.76761 True False 0
>>>>>>> transition 1
>>>>>> First you need to import it to R which could be tricky based on above line.
>>>>>> Some values will probably need to process through regular expression.
>>>>>>
>>>>>> If I understand correctly number after transition is a signal which estimets continuous chunks. If it is true then
>>>>>>
>>>>>> ?rle is a function which can estimate length of chunks.
>>>>>>
>>>>>> Cheers
>>>>>> Petr
>>>>>>
>>>>>>> Where the last variable defines which transition is currently active.
>>>>>>> However to separate these data into 'chunks' would involve making a
>>>>>>> comparison between each line of data & the preceding line of data to
>>>>>>> determine whether it is part of the same contiguous 'chunk'. Is this
>>>>>>> something that would be better achieved using external preprocessing
>>>>>>> written in a language I am more familiar with, as I haven't the
>>>>>>> foggiest how I would approach this within R?
>>>>>>>
>>>>>>> Regards,
>>>>>>> CJ Davies
>>>>>>>
>>>>>>> ______________________________________________
>>>> snipped
>>>>> Importing into R wasn't an issue; some of the fields contain spaces & symbols, but all the fields are tab separated so I can simply use;
>>>>>
>>>>> foo <- read.csv("bar",header=T,sep="\t")
>>>>>
>>>>> I've just written a hacky bit of Java that gives me the lines of each 'chunk' as a separate list & I think I'll then calculate these particular values using Java's Math class rather than trying to come up with a sensible way to import these 'chunks' back into R. When it comes to string/list manipulation like this I think my knowledge in Java & lack of knowledge in R makes the former the better option!
>>>>>
>>>> If you had offered the output of dput(head(foo, 20) ) and explained what defined a "chunk-defining transition", it would have been fairly easy to show you how to use cumsum in an ave() call to construct a grouping variable.
>>>>
>>>>
>>>>> Regards,
>>>>> CJ Davies
>>>>>
>>>>> ______________________________
>>>>
>>>> David Winsemius
>>>> Alameda, CA, USA
>>>>
>>> Here is an example 100 lines of the input --> http://paste2.org/2LZVGP5K
>>>
>>> The final value on each line, under the header "environment", is always one of ["real", "transition 1", "transition 2", "transition 3", "transition 4"]. A 'chunk-defining transition' is when this value changes.
>>>
>>> If there is a way to do this in R in a more elegant fashion than my hacky Java, then I would be glad to learn.
>> That pasted material does not appear to preserve the tabs. Input with your suggested code "does not work" in the sense that it brings in an object like this.
>>
>>> download.file("http://paste2.org/2LZVGP5K", "bar.txt")
>> trying URL 'http://paste2.org/2LZVGP5K'
>> Content type 'text/html; charset=UTF-8' length unknown
>> opened URL
>> .......... .......... ........
>> downloaded 28 Kb
>>
>>> foo <- read.csv("bar.txt",header=T,sep="\t")
>>> str(foo)
>> 'data.frame': 2829 obs. of 1 variable:
>> $ X..DOCTYPE.html.: Factor w/ 669 levels ""," ",..: 106 104 219 233 220 222 221 215 217 79 ...
>>
>> I SAY AGAIN:
>>
>> Need ; output of dput(head(foo, 100) )
>>
>>
>>> Regards,
>>> CJ Davies
>> David Winsemius
>> Alameda, CA, USA
>>
> That was a pastebin URI, so what you downloaded was HTML instead of raw
> text. This is the raw text;
Well, it was text but it had no tabs. On this mailing list, HTML is considered evil.
> foo$chunk <- c(NA, foo$environment[-1] != head(foo$environment,-1) )
> table(foo$chunk)
FALSE TRUE
503 106
> foo$chunk <- cumsum(c(1, foo$chunk[-1]) )
> table(foo$chunk)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
20 1 6 1 1 1 4 1 1 2 16 1 7 4 14 2 6 1 2 4 1 4 2 8 6 2
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
2 1 7 1 1 2 2 2 6 10 3 1 12 3 1 10 18 6 1 6 14 4 1 19 13 10
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
6 2 10 14 3 2 1 2 1 1 1 15 4 2 2 6 21 5 1 16 5 3 1 2 21 3
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104
1 2 3 4 4 3 5 1 9 1 3 3 7 2 5 6 6 5 13 1 1 8 1 2 2 3
105 106 107
6 9 70
So now you have a chunking index and can use `by` or `ave` or `for()`-loops
>
> http://cjdavies.org/foo
That was displayed as it it had tabs and after correcting the error of using T for TRUE it did succeed.
>
> Regards,
> CJ Davies
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list