[R] sequential row selection in dataframe

Michael Kubovy kubovy at virginia.edu
Tue Dec 26 13:42:07 CET 2006


On Dec 26, 2006, at 12:07 AM, Pedro Mardones wrote:

> I'm wondering if there is any 'efficient' approach for selecting a
> sample of 'every nth rows'  from a dataframe. For example, let's use
> the dataframe GAGurine in MASS library:
>
>> length(GAGurine[,1])
> [1] 314
>
> # select an 75% of the dataset, i.e. = 236 rows, every 2 rows starting
> from row 1
>> test<-GAGurine[seq(1,314,2),]
>> length(test[,1])
> [1] 157
>
> # so, I still need another 79 rows, one way could be:
> test2<-GAGurine[-seq(1,314,2),]
>> length(test2[,1])
> [1] 157
>> test3<-test2[seq(1,157,2),]
>
> # and then
> final<-rbind(test2,test3)
>> length(final[,1])
> [1] 236
>
> Does anyone have a better idea to get the same results but without
> creating different datasets like test2 and test3?

A probabilistic approach:

len <- length(GAGurine[,1])
GAGu <- GAGurine[sample(1:len, round(.75 * len)), ] # 236 rows

A deterministic one:

nr <- 1 #or 2
GAGu2 <- GAGurine[-seq(nr, len, 4),] # drop every 4th, giving 235 rows
nr <- 3 # or 4
will give 236 rows.
_____________________________
Professor Michael Kubovy
University of Virginia
Department of Psychology
USPS:     P.O.Box 400400    Charlottesville, VA 22904-4400
Parcels:    Room 102        Gilmer Hall
         McCormick Road    Charlottesville, VA 22903
Office:    B011    +1-434-982-4729
Lab:        B019    +1-434-982-4751
Fax:        +1-434-982-4766
WWW:    http://www.people.virginia.edu/~mk9y/



More information about the R-help mailing list