[R] apply and cousins

John Logsdon j.logsdon at quantex-research.com
Thu Jun 9 18:36:31 CEST 2016


Thanks Jim and others (and sorry Jim - an early version of this slipped
into your inbox :))

Apologies for not giving some concrete code - I was trying to explain in
words.

What I need to do is to fit a simple linear model to successive sections
of a long matrix.

So far, the best solution I have come up with uses apply twice:

Generate some data in a 100000*3 matrix:

N = 100000
Z = cbind(1:N,cumsum(rnorm(N,1,0.01)),rnorm(N,1.2,0.1)) #

where the first column is an index, the second a monotonic increasing
value representing time and the third just the measurements I want to
process.

Then write a function dVals1:

dVals1 = function(Y,DD,dT){which.min((Y[2] - dT) > DD[,2])))

which will identify the first row where the time is greater than current
time - dT.

So to identify the start of the data (say) 10 units before for each row,
we use apply and prepended this as a column to the array for later use:

ZZ = cbind(apply(Z,1,dVals1,Z,10),Z)

There may be some cases, particularly at the start, where later values are
extracted because the minimum returned by which.min is 1.

I now have start and finish pointers for each position so can proceed to
fit a simple linear model with the following function:

dVals2=function(D2,DD){
  if((D2[2]-D2[1])<10){return(rep(0,2))} # reject short examples
  DX=DD[D2[1]:D2[2],]
  Res=as.vector(lm(DX[,3]~DX[,2])$coefficients)
  return(Res)
}

which returns 2 0's either if there are fewer than 10 values, otherwise it
returns the intercept and slope calculated over the specified range.

Applying this to the whole data by:

t(apply(ZZ,1,dVals2,DD=ZZ))

does the job I think returning the results as an N * 2 matrix.

> Hi John,
> With due respect to the other respondents, here is something that might
help:
>
> # get a vector of values
> foo<-rnorm(100)
> # get a vector of increasing indices (aka your "recent" values)
> bar<-sort(sample(1:100,40))
> # write a function to "clump" the adjacent index values
> clump_adj_int<-function(x) {
>  index_list<-list(x[1])
>  list_index<-1
>  for(i in 2:length(x)) {
>   if(x[i]==x[i-1]+1)
>    index_list[[list_index]]<-c(index_list[[list_index]],x[i])
>   else {
>    list_index<-list_index+1
>    index_list[[list_index]]<-x[i]
>   }
>  }
>  return(index_list)
> }
> index_clumps<-clump_adj_int(bar)
> # write another function to sum the values
> sum_subsets<-function(indices,vector)
> return(sum(vector[indices],na.rm=TRUE))
> # now "apply" the function to the list of indices
> lapply(index_clumps,sum_subsets,foo)
>
> Jim
>
>
> On Thu, Jun 9, 2016 at 2:41 AM, John Logsdon
> <j.logsdon at quantex-research.com> wrote:
>> Folks
>>
>> Is there any way to get the row index into apply as a variable?
>>
>> I want a function to do some sums on a small subset of some very long
vectors, rolling through the whole vectors.
>>
>> apply(X,1,function {do something}, other arguments)
>>
>> seems to be the way to do it.
>>
>> The subset I want is the most recent set of measurements only - perhaps a
>> couple of hundred out of millions - but I can't see how to index each
value.  The ultimate output should be a matrix of results the length of
the input vector.  But to do the sum I need to access the current row
number.
>>
>> It is easy in a loop but that will take ages. Is there any vectorised
apply-like solution to this?
>>
>> Or does apply etc only operate on each row at a time, independently of
other rows?
>>
>>
>> Best wishes
>>
>> John
>>
>> John Logsdon
>> Quantex Research Ltd
>> +44 161 445 4951/+44 7717758675
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>


Best wishes

John

John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675



Best wishes

John

John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675



More information about the R-help mailing list