[R] ?max (so far...)
Erich Neuwirth
erich.neuwirth at univie.ac.at
Mon Jul 13 16:02:50 CEST 2009
Belated answer:
A few remarks regarding your questions:
Your running max problem could be solved in the following way:
(which is a soution based o Duncan Murdoch's suggestion,
but a little bit more general.
foldOrbit<-function(x,fun){
res<-numeric(length(x))
res[1]<-x[1]
for (i in 2:length(x)) res[i]<-fun(res[i-1],x[i])
res
}
or more generally
applySliding<-function(x,fun,winlength=length(x)){
res<-numeric(length(x))
for (i in seq_along(x)) {res[i]<-fun(x[(max(1,i-winlength+1)):i])}
res
}
foldOrbit(x,max)
will give you the running maxes of vector x.
For max, taking the max of the max of the sequence without the last
element
and the last element gives the max of the whole sequence.
It also works for min, sum, prod (all these are associative).
applySliding is more general. The second argument is the function you
want to apply "in running mode".
If you do not give the winlength, it will apply the function in
"running mode" an give correct result for
nonassociatve functions also.
If you give the winlength, it will only use the last winlength
elements of the vector.
Examples:
foldOrbit(1:10,max)
applySliding(1:10,max)
applySliding(1:10,max,3)
And now, for something completely different:
You seem to want to combine Excel and R in you work.
Possibly you can make your work easier if you user RExcel,
which is an add-in allowing to use R from within Excel.
Information is available at rcom.univie.ac.at
and there is (half hour long) video demonstrating how to use
R from within Excel.
On Jul 1, 2009, at 10:26 PM, Mark Knecht wrote:
> On Wed, Jul 1, 2009 at 12:54 PM, Duncan
> Murdoch<murdoch at stats.uwo.ca> wrote:
>> On 01/07/2009 1:26 PM, Mark Knecht wrote:
>>>
>>> On Wed, Jul 1, 2009 at 9:39 AM, Duncan Murdoch<murdoch at stats.uwo.ca>
>>> wrote:
>>>>
>>>> On 01/07/2009 11:49 AM, Mark Knecht wrote:
>>>>>
>>>>> Hi,
>>>>> I have a data.frame that is date ordered by row number - earliest
>>>>> date first and most current last. I want to create a couple of new
>>>>> columns that show the max and min values from other columns *so
>>>>> far* -
>>>>> not for the whole data.frame.
>>>>>
>>>>> It seems this sort of question is really coming from my lack of
>>>>> understanding about how R intends me to limit myself to portions
>>>>> of a
>>>>> data.frame. I get the impression from the help files that the
>>>>> generic
>>>>> way is that if I'm on the 500th row of a 1000 row data.frame and
>>>>> want
>>>>> to limit the search max does to rows 1:500 I should use something
>>>>> like [1:row] but it's not working inside my function. The idea
>>>>> works
>>>>> outside the function, in the sense I can create tempt1[1:7] and
>>>>> the
>>>>> max function returns what I expect. How do I do this with row?
>>>>>
>>>>> Simple example attached. hp should be 'highest p', ll should be
>>>>> 'lowest l'. I get an error message "Error in 1:row : NA/NaN
>>>>> argument"
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>> <SNIP>
>>>>>
>>>>> HighLow = function (MyFrame) {
>>>>> temp1 <- MyFrame$p[1:row]
>>>>> MyFrame$hp <- max(temp1) ## Highest p
>>>>> temp1 <- MyFrame$l[1:row]
>>>>> MyFrame$ll <- min(temp1) ## Lowest l
>>>>>
>>>>> return(MyFrame)
>>>>> }
>>>>
>>>> You get an error in this function because you didn't define row,
>>>> so R
>>>> assumes you mean the function in the base package, and 1:row
>>>> doesn't make
>>>> sense.
>>>>
>>>> What you want for the "highest so far" is the cummax (for
>>>> "cumulative
>>>> maximum") function. See ?cummax.
>>>>
>>>> Duncan Murdoch
>>>>
>>>
>>> Duncon,
>>> OK, thanks. That makes sense, as long as I want the cummax from
>>> the
>>> beginning of the data.frame. (Which is exactly what I asked for!)
>>>
>>> How would I do this in the more general case if I was looking for
>>> the cummax of only the most recent 50 rows in my data.frame? What
>>> I'm
>>> trying to get down to is that as I fill in my data.frame I need to
>>> be
>>> able get a max or min or standard deviation of the previous so many
>>> rows of data - not the whole column - and I'm just not grasping
>>> how to
>>> do this. Is seems like I should be able to create a data set that's
>>> only a portion of a column while I'm in the function and then take
>>> the
>>> cummax on that, or use it as an input to a standard deviation, etc.?
>>
>> What you describe might be called a "running max". The caTools
>> package has
>> a runmax function that probably does what you want.
>>
>> More generally, you can always write a loop. They aren't
>> necesssrily fast
>> or elegant, but they're pretty general. For example, to calculate
>> the max
>> of the previous 50 observations (or fewer near the start of a
>> vector), you
>> could do
>>
>> x <- ... some vector ...
>>
>> result <- numeric(length(x))
>> for (i in seq_along(x)) {
>> result[i] <- max( x[ max(1, i-49):i ])
>> }
>>
>> Duncan Murdoch
>>
>
> Thanks for the pointer. I'll check it out.
>
> Today I've managed to get pretty much all of my Excel spreadsheet
> built in R except for some of the charts. It took me a week and a half
> in Excel. This is my 3rd full day with R. Charts are next.
>
> I appreciate your help and the help I've gotten from others. Thanks
> so much.
>
> cheers,
> Mark
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Erich Neuwirth
Didactic Center for Computer Science and Institute for Scientific
Computing
University of Vienna
More information about the R-help
mailing list