[R] removing dropouts (setting the values to NA)

Fri Feb 14 07:49:03 CET 2003

Dear all 

I hope there is somebody who encountered similar problem and 
can give me a  hint how to do it or where to look. 

I have several data sets in DBF format. I can transfer them to R 
data frames and  then I want to perform aggregation or some 
other computations, but there are  values in my data which I can 
call drop-outs and I want them to be discarded (see  example).  

Usually I can find row of zeros (the measuring device is out of 
order or does not  obtain any data) or a gradual decrease of some 
measured values due to real  interruption of the process. I would 
like to do some evaluation (automatic) to set  an logical vector 
where, for instance, TRUE will stay for "correct" values and  
FALSE will be for "drop-outs" (or vice versa).  

Preferably I would like to ***discard few values before and after 
actual drop-out  occurred***. Then I will set all "wrong" values in 
my variables to NA and  continue further computations. 

Here is some foo code for making artificial drop-outs similar like 
in my actual  data 

x<-seq(0,100,.1) 
y<-sin(x)+rnorm(length(x),mean=0,sd=1) 
y1<-y-c(rep(0,200),exp(x[20:50]),rep(0,770)) 
y<-y1+50 
y<-y*(y>0) 
y[600:700]<-0 

My actual data looks like: 

Date, 		Time, 		Var1, 	Var2, 	Var3, ...... 
01.01.01, 	03:05:00, 	12, 	27, 	0.53, ..... 
01.01.01, 	03:05:15, 	12.2, 	29, 	1.2, ..... 
01.01.01, 	03:05:30, 	12.2, 	29, 	0, ..... 
......... 

in several data sets.  

I can simply put  

idx1<-y==0  

I can set an arbitrary limit under or over which the value is 
considered a drop-out  

idx2<-y<45 

and I can combine both indexes 

idx<-as.logical(idx1+idx2) 

But I do not know how easily enlarge the TRUE parts of index 
vector forwards  and backwards the actual drop-out occurred.  

The only way how I am able to accomplish it is  

changes<-seq(along=x)[as.logical(diff(idx))]+1 

than select odd an even values from changes subtract a certain 
value from odd  and add a value to even and construct something 
like that 

c(rep(F,odd[1]),rep(T,even[1]-odd[1]),rep(F,odd[2]-
even[1]),rep(T,even[2]- odd[2]),rep(F,length(x)-even[2])) 

what is a little bit complicated and not very general solution. 

Please can somebody help me find the better procedure or 
function for such drop- out filtering? 

Thank you. 

Petr Pikal
Precheza a.s., Nabř.Dr.E.BeneÜe 24, 750 62 Přerov
tel: +420581 252 257 ; 724 008 364
petr.pikal at precheza.cz; p.pik at volny.cz
fax +420581 252 561