[R] Sample of a subsample
Bert Gunter
bgunter.4567 at gmail.com
Mon Sep 25 23:09:59 CEST 2017
Yes.
Beating a pretty weary horse, a slightly cleaner version of my prior
offering using with(), instead of within() is:
with(dat,
dat[sampleNo[sample(var1[!var1%%2 & !sampleNo], 10, rep=FALSE)],
"sampleNo"] <- 2)
with() and within() are convenient ways to avoid having to repeatedly name
the columns via $ . Note also the use of logical subscripting of the data
frame in which numeric 0 is coerced to FALSE and any nonzero value to TRUE
(which I should have done previously).
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Sep 25, 2017 at 11:43 AM, Eric Berger <ericjberger at gmail.com> wrote:
> Hi David,
> I was about to post a reply when Bert responded. His answer is good
> and his comment to use the name 'dat' rather than 'data' is instructive.
> I am providing my suggestion as well because I think it may address
> what was causing you some confusion (mainly to use "which", but also
> the missing !)
>
> idx2 <- sample( which( (!data$var1%%2) & data$sampleNo==0 ), size=10,
> replace=F)
> data[idx2,]$sampleNo <- 2
>
> Eric
>
>
>
> On Mon, Sep 25, 2017 at 9:03 PM, Bert Gunter <bgunter.4567 at gmail.com>
> wrote:
>
>> For personal aesthetic reasons, I changed the name "data" to "dat".
>>
>> Your code, with a slight modification:
>>
>> set.seed (1357) ## for reproducibility
>> dat <- data.frame(var1=seq(1:40), var2=seq(40,1))
>> dat$sampleNo <- 0
>> idx <- sample(seq(1,nrow(dat)), size=10, replace=F)
>> dat[idx,"sampleNo"] <-1
>>
>> ## yielding
>> > dat
>>
>> var1 var2 sampleNo
>> 1 1 40 0
>> 2 2 39 1
>> 3 3 38 0
>> 4 4 37 0
>> 5 5 36 0
>> 6 6 35 1
>> 7 7 34 0
>> 8 8 33 0
>> 9 9 32 0
>> 10 10 31 0
>> 11 11 30 0
>> 12 12 29 0
>> 13 13 28 0
>> 14 14 27 0
>> 15 15 26 1
>> 16 16 25 1
>> 17 17 24 0
>> 18 18 23 0
>> 19 19 22 0
>> 20 20 21 1
>> 21 21 20 0
>> 22 22 19 1
>> 23 23 18 0
>> 24 24 17 1
>> 25 25 16 0
>> 26 26 15 1
>> 27 27 14 0
>> 28 28 13 0
>> 29 29 12 0
>> 30 30 11 0
>> 31 31 10 0
>> 32 32 9 0
>> 33 33 8 0
>> 34 34 7 0
>> 35 35 6 1
>> 36 36 5 0
>> 37 37 4 1
>> 38 38 3 0
>> 39 39 2 0
>> 40 40 1 0
>>
>> ## This is basically a transcription of your specification into indexing
>> logic
>>
>> dat <- within(dat,sampleNo[sample(var1[(var1%%2 == 0) &
>> sampleNo==0],10,rep=FALSE)] <- 2)
>>
>> ##yielding
>> > dat
>>
>> var1 var2 sampleNo
>> 1 1 40 0
>> 2 2 39 1
>> 3 3 38 0
>> 4 4 37 2
>> 5 5 36 0
>> 6 6 35 1
>> 7 7 34 0
>> 8 8 33 2
>> 9 9 32 0
>> 10 10 31 2
>> 11 11 30 0
>> 12 12 29 0
>> 13 13 28 0
>> 14 14 27 2
>> 15 15 26 1
>> 16 16 25 1
>> 17 17 24 0
>> 18 18 23 2
>> 19 19 22 0
>> 20 20 21 1
>> 21 21 20 0
>> 22 22 19 1
>> 23 23 18 0
>> 24 24 17 1
>> 25 25 16 0
>> 26 26 15 1
>> 27 27 14 0
>> 28 28 13 2
>> 29 29 12 0
>> 30 30 11 2
>> 31 31 10 0
>> 32 32 9 2
>> 33 33 8 0
>> 34 34 7 2
>> 35 35 6 1
>> 36 36 5 2
>> 37 37 4 1
>> 38 38 3 0
>> 39 39 2 0
>> 40 40 1 0
>>
>>
>>
>>
>>
>> dat <- within(dat,sampleNo[sample(var1[(var1%%2 == 0) &
>> sampleNo==0],10,rep=FALSE)] <- 2)
>>
>>
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>> On Mon, Sep 25, 2017 at 10:27 AM, David Studer <studerov at gmail.com>
>> wrote:
>>
>> > Hello everybody!
>> >
>> > I have the following problem: I'd like to select a sample from a
>> subsample
>> > in a dataset. Actually, I don't want to select it, but to create a new
>> > variable sampleNo that indicates to which sample (one or two) a case
>> > belongs to.
>> >
>> > Lets suppose I have a dataset containing 40 cases:
>> >
>> > data <- data.frame(var1=seq(1:40), var2=seq(40,1))
>> >
>> > The first sample (n=10) I drew like this:
>> >
>> > data$sampleNo <- 0
>> > idx <- sample(seq(1,nrow(data)), size=10, replace=F)
>> > data[idx,]$sampleNo <- 1
>> >
>> > Now, (and here my problems start) I'd like to draw a second sample
>> (n=10).
>> > But this sample should be drawn from the cases that don't belong to the
>> > first sample only. *Additionally, "var1" should be an even number.*
>> >
>> > So sampleNo should be 0 for cases that were not drawn at all, 1 for
>> cases
>> > that belong to the first sample and 2 for cases belonging to the second
>> > sample (= sampleNo equals 0 and var1 is even).
>> >
>> > I was trying to solve it like this:
>> >
>> > idx2<-data$var1%%2 & data$sampleNo==0
>> > sample(data[idx2,], size=10, replace=F)
>> >
>> > But how can I set sampleNo to 2?
>> >
>> >
>> > Thank you very much for your help!
>> >
>> > David
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> > posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list