[R] Sampling question
arun
smartpink111 at yahoo.com
Tue Nov 5 21:59:31 CET 2013
Hi,
You may try:
dat1 <- structure(list(SubID = 1:8, CSE1 = c(6L, 6L, 5L, 5L, 5L, 5L,
3L, 3L), CSE2 = c(5L, 4L, 5L, 4L, 6L, 4L, 6L, 6L), CSE3 = c(6L,
7L, 5L, 3L, 7L, 3L, 6L, 6L), CSE4 = c(2L, 2L, 5L, 4L, 5L, 6L,
3L, 3L), WSE1 = c(6L, 6L, 5L, 4L, 6L, 4L, 6L, 6L), WSE2 = c(2L,
6L, 5L, 4L, 4L, 3L, 5L, 5L), WSE3 = c(2L, 2L, 4L, 5L, 4L, 7L,
2L, 4L), WSE4 = c(4L, 3L, 5L, 2L, 1L, 3L, 1L, 7L)), .Names = c("SubID",
"CSE1", "CSE2", "CSE3", "CSE4", "WSE1", "WSE2", "WSE3", "WSE4"
), class = "data.frame", row.names = c(NA, -8L))
fun1 <- function(dat, rep){
res <- replicate(rep,{
lst1 <-lapply(sample(nrow(dat),nrow(dat)),function(x) sample(dat[x,2:5],4))
names(lst1) <- sapply(lst1,row.names)
lst1[-c(1:2)] <- lapply(names(lst1)[-c(1:2)],function(i) {
x1 <- dat[i,6:9][is.na(match(gsub("^.","",names(dat[i,6:9])),gsub("^.","",names(lst1[[i]][1]))))]
cbind(lst1[[i]][1], sample(x1,3))
}
)
do.call(rbind,lapply(lst1,function(x) {datNew <- cbind(SubID= as.numeric(row.names(x)), x); names(datNew)[-1] <- "var"; datNew}))
})
res
}
res1 <- fun1(dat1,5)
lst2 <- lapply(split(res1,col(res1)), function(x) {dat <- do.call(cbind,x); colnames(dat) <- c("SubID", rep("var",4));dat})
do.call(cbind,res1[,1])
do.call(cbind,res1[,2])
A.K.
I have a question about drawing samples from a data frame. This might
sound really tricky. Let me use a data frame I have posted earlier as an example:
SubID CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4
1 6 5 6 2 6 2 2 4
2 6 4 7 2 6 6 2 3
3 5 5 5 5 5 5 4 5
4 5 4 3 4 4 4 5 2
5 5 6 7 5 6 4 4 1
6 5 4 3 6 4 3 7 3
7 3 6 6 3 6 5 2 1
8 3 6 6 3 6 5 4 7
this data frame have two sets of variables. each set simply
represent one scale. as shown above, the first scale, say CSE, consists
of four items: CSE1, CSE2, CSE3, and CSE4, whereas the second scale, say
WSE, also has four items: WSE1, WSE2, WSE3, WSE4.
the leftmost column lists the subjects' ID.
I wanna create a new data frame through sampling random numbers
from the data frame above. Below is the structure of the new data frame.
SubID var var var var
s c c c c
s c c c c
s c w w w
s c w w w
s c w w w
s c w w w
s c w w w
s c w w w
in the new data frame:
s= SubID range from 1 to 8
var= variables
c=CSE numbers
w=WSE numbers
some rules to construct the new data frame:
1. the top two rows have to be filled with CSE numbers; the
numbers in the cells of each row should be randomized. for example, if
the first row is an array of numbers from subject 4, they can follow the
order: 4(CSE2), 5(CSE1), 3(CSE3), and 4(CSE4). Also, the numbers in the
second row does not have to follow the order of the first row. for
example, similarly, if the first row is an array of numbers from subject
4 in the order: 4(CSE2), 5(CSE1), 3(CSE3), and 4(CSE4), numbers in the
second row (assuming it is from subject 8) does not have to be 6(CSE2),
3(CSE1), 6(CSE3), and 3(CSE4). numbers in these two rows should be drawn
without replacement.
2. each of the rest of the rows should include a CSE number in
the leftmost cell and three WSE numbers on the right. At the same time,
in each row, the three WSE numbers on the right have to be only those
numbers that are not corresponding to the CSE number in the leftmost
cell. For example, if the CSE number in the leftmost cell is 4, a CSE2
number from subject 6, the three WSE numbers on the right side can only
be 4(WSE1), 7(WSE3), and 3(WSE4) from subject 6.
3. the numbers in each row can only be drawn from the same
subject. Also, Subjects should be randomized. Specifically, they does
have to be in the following order:
SubID
1
2
3
4
5
6
7
8
they can be:
SubID
2
8
5
4
1
6
7
3
4. repeat the whole process 1000 times to draw 1000 random samples
Any ideas? Thanks in advance!! :)
More information about the R-help
mailing list