[R] sampling random groups with all observations in the group
Greg Snow
Greg.Snow at intermountainmail.org
Fri Mar 2 22:42:27 CET 2007
One possibility is to use split to create a list with each of your
groups as an element, sample from the list, then combine back into a
data frame. For example:
> mydata <- data.frame(group=sample(LETTERS[1:5], 100, replace=TRUE),
+ x= 1:100, y= rnorm(100) )
> head(mydata)
group x y
1 B 1 -1.1709539
2 A 2 0.2438249
3 C 3 -1.9079472
4 E 4 0.6155387
5 E 5 -1.0671110
6 C 6 0.8109344
> mydata2 <- split(mydata, mydata$group)
> mysamp <- sample(5,2)
> mydata3 <- do.call('rbind',mydata2[mysamp])
> summary(mydata3)
group x y
A: 0 Min. : 3.00 Min. :-1.9079
B: 0 1st Qu.:18.75 1st Qu.:-0.9798
C:17 Median :46.50 Median :-0.4309
D:19 Mean :45.19 Mean :-0.2333
E: 0 3rd Qu.:68.25 3rd Qu.: 0.4351
Max. :97.00 Max. : 3.0469
>
Hope this helps,
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Wadud, Zia
> Sent: Friday, March 02, 2007 1:12 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] sampling random groups with all observations in the group
>
> Hi
> I have a panel dataset with large number of groups and
> differing number of observations for each group. I want to
> randomly select say, 20% of the groups or 200 groups, but
> along with all observations from the selcted groups (with the
> corresponding data).
> I guess it is possible to generate a random sample from the
> groups ids and then match that with the entire dataset to
> have the intended dataset, but it sounds cumbersome and
> possibly there is an easier way to do this? checked the
> package 'sampling' or command 'sample', but they cant do
> exactly the same thing.
> I was wondering if someone on this list will be able to share
> his/her knowldege?
> Thanks in advance,
> Zia
> **********************************************************
> Zia Wadud
> PhD Student
> Centre for Transport Studies
> Department of Civil and Environmental Engineering Imperial
> College London London SW7 2AZ Tel +44 (0) 207 594 6055
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list