[R] Variance of multiple non-contiguous time periods?
David Winsemius
dwinsemius at comcast.net
Tue Nov 4 18:02:53 CET 2014
On Nov 4, 2014, at 8:35 AM, CJ Davies wrote:
> On 04/11/14 16:13, PIKAL Petr wrote:
>> Hi
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>> project.org] On Behalf Of CJ Davies
>>> Sent: Tuesday, November 04, 2014 2:50 PM
>>> To: Jim Lemon; r-help at r-project.org
>>> Subject: Re: [R] Variance of multiple non-contiguous time periods?
>>>
>>> On 04/11/14 09:11, Jim Lemon wrote:
>>>> On Mon, 3 Nov 2014 12:45:03 PM CJ Davies wrote:
>>>>> ...
>>>>> On 30/10/14 21:33, Jim Lemon wrote:
>>>>> If I understand, you mean to calculate deviations for each
>>> individual
>>>>> 'chunk' of each transition & then aggregate the results? This is
>>> what
>>>>> I'd been thinking about, but is there a sensible manner within R to
>>>>> achieve this, or is it something for which it would be easier to
>>>>> preprocess the data in an external tool? Is there some way to subset
>>>> the
>>>>> data such that I can work over just contiguous 'chunks'?
>>>>>
>>>> Exactly. If there is some combination of existing variables that can
>>>> be combined to make a set of unique values for each "chunk", you can
>>>> calculate the deviations within each "chunk", then average the
>>> squared
>>>> deviations for each type of "chunk", weighting by the duration of the
>>>> "chunks" so that you don't bias the pooled variance toward the longer
>>>> "chunks".
>>>>
>>>> Jim
>>>>
>>>
>>> I am stumped for a way of automating this process though. Each line of
>>> log data looks like this;
>>>
>>> 2406 55.4 (-11.2, 1.0, -0.9) (-4.1, 1.0, 0.0) 7.077912
>>> 0.9203392 (0.0,
>>> 0.7, -0.1, 0.7) 8.129684 89.41537 -8.212769 (0.0, 0.7, -0.1,
>>> 0.7)
>>> 8.129684 89.41537 351.7872 1 0 0 False 0.15 3
>>> 37.76761 True False 0
>>> transition 1
>>
>> First you need to import it to R which could be tricky based on above line.
>> Some values will probably need to process through regular expression.
>>
>> If I understand correctly number after transition is a signal which estimets continuous chunks. If it is true then
>>
>> ?rle is a function which can estimate length of chunks.
>>
>> Cheers
>> Petr
>>
>>>
>>> Where the last variable defines which transition is currently active.
>>> However to separate these data into 'chunks' would involve making a
>>> comparison between each line of data & the preceding line of data to
>>> determine whether it is part of the same contiguous 'chunk'. Is this
>>> something that would be better achieved using external preprocessing
>>> written in a language I am more familiar with, as I haven't the
>>> foggiest how I would approach this within R?
>>>
>>> Regards,
>>> CJ Davies
>>>
>>> ______________________________________________
>>
snipped
>>
>
> Importing into R wasn't an issue; some of the fields contain spaces & symbols, but all the fields are tab separated so I can simply use;
>
> foo <- read.csv("bar",header=T,sep="\t")
>
> I've just written a hacky bit of Java that gives me the lines of each 'chunk' as a separate list & I think I'll then calculate these particular values using Java's Math class rather than trying to come up with a sensible way to import these 'chunks' back into R. When it comes to string/list manipulation like this I think my knowledge in Java & lack of knowledge in R makes the former the better option!
>
If you had offered the output of dput(head(foo, 20) ) and explained what defined a "chunk-defining transition", it would have been fairly easy to show you how to use cumsum in an ave() call to construct a grouping variable.
> Regards,
> CJ Davies
>
> ______________________________
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list