[R] subsamples and regressions for 100 times

David L Carlson dcarlson at tamu.edu
Tue Feb 17 18:51:30 CET 2015


Expanding a bit on Michael's answer, you don't need the sampling package for this, just the sample.int() function to draw a random set of integers that you will use to extract rows from each of your groups. The write a function that returns what you want, the regression slopes from each group and use that function with the replicate() function. Your problem is a good way to illustrate the lapply(), sapply(), replicate() family of functions in R:

# Split the data into a list of data frames
datlist <- split(dat, dat$L_group)
# Write a function to draw the sample and perform the regression on each group
slopes <- function(lst) {
	# Get the minimum sample size
	minsize <- min(sapply(lst, nrow))
	# Draw sample (row numbers) of size minsize from each group
	samlist <- lapply(sapply(lst, nrow), sample.int, size=minsize)
	# Extract sample from each group
	samples <- lapply(names(lst), function(x) lst[[x]][samlist[[x]],])
	# Run the regressions for each group and extract the slopes
	results <- sapply(samples, function(x) coef(lm(co2~temp, x))[2])
	# Use the group names to label the slopes
	names(results) <- names(datlist)
	return(results)
}
# You can get a single set of results with
(results <- slopes(datlist))
#         A         B         C 
# 1.0128392 0.2658041 1.3423786

# To get 100 runs
many <- t(replicate(100, slopes(datlist)))
head(many)
#              A         B        C
# [1,] 1.4326103 0.2658041 1.357475
# [2,] 1.4754324 0.2658041 1.309208
# [3,] 0.9838589 0.2658041 1.408987
# [4,] 0.9993144 0.2658041 1.354297
# [5,] 1.0134187 0.2658041 1.397112
# [6,] 1.4922856 0.2658041 1.312531
>

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Michael Dewey
Sent: Tuesday, February 17, 2015 9:52 AM
To: Angela Smith; r-help at r-project.org
Subject: Re: [R] subsamples and regressions for 100 times

Comment inline

On 17/02/2015 12:40, Angela Smith wrote:
>
>
> Hi R user,
> I'm new to R so
> my problem is probably pretty simple but I'm stuck:
>
>
>
> my data is consist of 2 variables: co2, temp and one
> treatment (l_group). The sample size is different among the treatments. so
>
> that, I wanted to make equal sample size among three groups (A,B and C) of the
> treatment.
>

Not sure whether that is necessary for regression but you did not tell 
us why you want to do that.

> For this one, I used subsamples technique. Using
> subsample, each time the data are different among the three groups of the
> treatment.
>
> so that I want to run regression (co2~temp) for a 100
> subsamples for each group of treatment (100 times subsample).
>

The usual way to do this is to store the subsamples in a list and then 
write a function and use lapply, say to store your models. You then have 
another list to which you can then apply the extractor function of your 
choice.


> it means that I will have 100 regression equations.  Later, I want to compare the slope of the
> regression among the three groups. is there simple way to make a loop so that I
> can compare it?
>
> Thanks in advance!
>
>
>
> Angela
>
> ================
> Here is the example:
>
> dat<-structure(list(co2 = c(0.15, 0.148, 0.125, 0.145, 0.138, 0.23,
> 0.26, 0.35, 0.41, 0.45, 0.39, 0.42, 0.4, 0.43, 0.26, 0.3, 0.34,
> 0.141, 0.145, 0.153, 0.151, 0.128, 0.23, 0.26), temp = c(0.0119,
> 0.0122, 0.0089, 0.0115, 0.0101, 0.055, 0.097, 0.22, 0.339, 0.397,
> 0.257, 0.434, 0.318, 0.395, 0.087, 0.13, 0.154, 0.0107, 0.0112,
> 0.0119, 0.012, 0.0092, 0.055, 0.089), L_group = structure(c(1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("co2",
> "temp", "L_group"), class = "data.frame", row.names = c(NA, -24L
> ))
>
> head(dat)
> library(sampling)
>
> # strata.sampling -----
> strata.sampling <- function(data, group,size, method = NULL) {
>   require(sampling)
>    if (is.null(method)) method <- "srswor"
>    temp <- data[order(data[[group]]), ]
>    ifelse(length(size)> 1,
>           size <- size,
>           ifelse(size < 1,
>                  size <- round(table(temp[group]) * size),
>                  size <- rep(size, times=length(table(temp[group])))))
>    strat = strata(temp, stratanames = names(temp[group]),
>                   size = size, method = method)
>    getdata(temp, strat)
> }
>
> #--------------------------------------------------
> sub_dat <- strata.sampling(dat, 'L_group', 4)#
> Lmodel_subdata1<-lm(co2~temp, data=subdat)
> Lmodel_subdata1#coef
>
> sub_dat2 <- strata.sampling(dat, 'L_group', 4)#
> Lmodel_subdata2<-lm(co2~temp, data=subdat2)
> Lmodel_subdata2#coef
>
> and so on.....[for 100 times)
>
> Table<-rbind(Lmodel_subdata1$coef, Lmodel_subdata1$coef, ....)
>
>
>   		 	   		
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2015.0.5645 / Virus Database: 4284/9131 - Release Date: 02/17/15
>
>
>

-- 
Michael
http://www.dewey.myzen.co.uk

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list