[R] Filtering out a data.frame

Tue Jun 8 06:18:33 CEST 2010

Jeff08 wrote:
> Sample Data.Frame format
> 
> Name is Returns.nodup
> 
>             X       id ticker      date_ adjClose totret RankStk
> 427225 427225 00174410    AHS 2001-11-13    21.66    100    1235
> 
> 
> "id" uniquely defines a row
> 
> 
> What I am trying to do is filter out id's that have less than 1500 data
> points (by date)
> 
> First, I used
> 
> total<-by(Returns.nodup, Returns.nodup$id,nrow)
> 
> which subsetted by ID and calculated the number of data points for each ID
> 
> Now I am trying to figure out a way to use this to filter out the original
> data.frame (Returns.nodup)
> 
> I have tried using the following, but it is VERY slow:
> 
> z<-unlist(lapply(1:length(y), function(i) which(a$id==y[i]) ))
> Returns.filtered<-Returns.nodup[z,]
> 
> Is there a faster way to do this?
> 

Most likely, yes.  But without a reproducible example, it's difficult to think 
about the problem.  Can you please give us one?

If not, you can probably cobble something together using ?table and ?%in% I'm 
guessing.