[R] Extracting repeated observations from a large data set

Bill.Venables@CMIS.CSIRO.AU Bill.Venables at CMIS.CSIRO.AU
Sun Dec 31 04:03:10 CET 2000



> -----Original Message-----
> From: J.Brian.Adams at stat.math.ethz.ch, PhD
> [mailto:brian_adams at jbadams.com]
> Sent: Sunday, 31 December 2000 6:20
> To: r-help at stat.math.ethz.ch
> Subject: [R] Extracting repeated observations from a large data set
> 
> 
> I have a dataset containing over 750,000 observations.  I have read them
> into an nx6 matrix.  If possible I would like to prune it by 
> extracting only those observations in which a specific characteristic that
is
> contained in column j appears at least k times.  I have used the
> following where k=3 and the fifth column contains the test data
> 
> 			ObsMatrix[as.numeric(table(ObsMatrix[,5])) > 3,]
> 
> but it does not seem to work.  

Try putting together what you want step by step instead of all in one hit.

1. Calculate the frequencies of what is in column 5.

	fr <- table(ObsMatrix[, 5])

2. Which of them are bigger than 3?  Here you have to be careful:

	bigs <- as.numeric(names(fr)[fr > 3])

(Work your way out from the inside - this is the crucial step).

3. Now pick the rows you wish to retain

	keep <- match(ObsMatrix[, 5], bigs, 0) > 0

on later versions you can use keep <- is.element(ObsMatrix[, 5], bigs) which
is much clearer to understand.)

4. Finally, get your reduced matrix:

	ObsMatrix <- ObsMatrix[keep, ]


Of course you can stitch some of these steps together, but be careful.  Your
mistake was in mixing up frequencies with the values that pertained to them.

Bill Venables.



> It returns certain rows from the matrix,
> but not necessarily those with more than three repeats, and it only
> returns one row for each match.  I need to be able to keep all of the
> duplicate records in the data.  Is there a way to do this 
> without using
> several nested for loops?
> 
> 
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-.-.-
> r-help mailing list -- Read 
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list