[R] subsamples and regressions for 100 times

Angela Smith angela.smith2071 at hotmail.com
Tue Feb 17 19:05:26 CET 2015


Dear David and Michael, 
Thank you so much for the code. It helped me to understand in making a loop and perform the analysis. I am really obliged with your help. 
cheers,
AS
=====




> From: dcarlson at tamu.edu
> To: info at aghmed.fsnet.co.uk; angela.smith2071 at hotmail.com; r-help at r-project.org
> Subject: RE: [R] subsamples and regressions for 100 times
> Date: Tue, 17 Feb 2015 17:51:30 +0000
> 
> Expanding a bit on Michael's answer, you don't need the sampling package for this, just the sample.int() function to draw a random set of integers that you will use to extract rows from each of your groups. The write a function that returns what you want, the regression slopes from each group and use that function with the replicate() function. Your problem is a good way to illustrate the lapply(), sapply(), replicate() family of functions in R:
> 
> # Split the data into a list of data frames
> datlist <- split(dat, dat$L_group)
> # Write a function to draw the sample and perform the regression on each group
> slopes <- function(lst) {
> 	# Get the minimum sample size
> 	minsize <- min(sapply(lst, nrow))
> 	# Draw sample (row numbers) of size minsize from each group
> 	samlist <- lapply(sapply(lst, nrow), sample.int, size=minsize)
> 	# Extract sample from each group
> 	samples <- lapply(names(lst), function(x) lst[[x]][samlist[[x]],])
> 	# Run the regressions for each group and extract the slopes
> 	results <- sapply(samples, function(x) coef(lm(co2~temp, x))[2])
> 	# Use the group names to label the slopes
> 	names(results) <- names(datlist)
> 	return(results)
> }
> # You can get a single set of results with
> (results <- slopes(datlist))
> #         A         B         C 
> # 1.0128392 0.2658041 1.3423786
> 
> # To get 100 runs
> many <- t(replicate(100, slopes(datlist)))
> head(many)
> #              A         B        C
> # [1,] 1.4326103 0.2658041 1.357475
> # [2,] 1.4754324 0.2658041 1.309208
> # [3,] 0.9838589 0.2658041 1.408987
> # [4,] 0.9993144 0.2658041 1.354297
> # [5,] 1.0134187 0.2658041 1.397112
> # [6,] 1.4922856 0.2658041 1.312531
> >
> 
> -------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
> 
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Michael Dewey
> Sent: Tuesday, February 17, 2015 9:52 AM
> To: Angela Smith; r-help at r-project.org
> Subject: Re: [R] subsamples and regressions for 100 times
> 
> Comment inline
> 
> On 17/02/2015 12:40, Angela Smith wrote:
> >
> >
> > Hi R user,
> > I'm new to R so
> > my problem is probably pretty simple but I'm stuck:
> >
> >
> >
> > my data is consist of 2 variables: co2, temp and one
> > treatment (l_group). The sample size is different among the treatments. so
> >
> > that, I wanted to make equal sample size among three groups (A,B and C) of the
> > treatment.
> >
> 
> Not sure whether that is necessary for regression but you did not tell 
> us why you want to do that.
> 
> > For this one, I used subsamples technique. Using
> > subsample, each time the data are different among the three groups of the
> > treatment.
> >
> > so that I want to run regression (co2~temp) for a 100
> > subsamples for each group of treatment (100 times subsample).
> >
> 
> The usual way to do this is to store the subsamples in a list and then 
> write a function and use lapply, say to store your models. You then have 
> another list to which you can then apply the extractor function of your 
> choice.
> 
> 
> > it means that I will have 100 regression equations.  Later, I want to compare the slope of the
> > regression among the three groups. is there simple way to make a loop so that I
> > can compare it?
> >
> > Thanks in advance!
> >
> >
> >
> > Angela
> >
> > ================
> > Here is the example:
> >
> > dat<-structure(list(co2 = c(0.15, 0.148, 0.125, 0.145, 0.138, 0.23,
> > 0.26, 0.35, 0.41, 0.45, 0.39, 0.42, 0.4, 0.43, 0.26, 0.3, 0.34,
> > 0.141, 0.145, 0.153, 0.151, 0.128, 0.23, 0.26), temp = c(0.0119,
> > 0.0122, 0.0089, 0.0115, 0.0101, 0.055, 0.097, 0.22, 0.339, 0.397,
> > 0.257, 0.434, 0.318, 0.395, 0.087, 0.13, 0.154, 0.0107, 0.0112,
> > 0.0119, 0.012, 0.0092, 0.055, 0.089), L_group = structure(c(1L,
> > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
> > 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("co2",
> > "temp", "L_group"), class = "data.frame", row.names = c(NA, -24L
> > ))
> >
> > head(dat)
> > library(sampling)
> >
> > # strata.sampling -----
> > strata.sampling <- function(data, group,size, method = NULL) {
> >   require(sampling)
> >    if (is.null(method)) method <- "srswor"
> >    temp <- data[order(data[[group]]), ]
> >    ifelse(length(size)> 1,
> >           size <- size,
> >           ifelse(size < 1,
> >                  size <- round(table(temp[group]) * size),
> >                  size <- rep(size, times=length(table(temp[group])))))
> >    strat = strata(temp, stratanames = names(temp[group]),
> >                   size = size, method = method)
> >    getdata(temp, strat)
> > }
> >
> > #--------------------------------------------------
> > sub_dat <- strata.sampling(dat, 'L_group', 4)#
> > Lmodel_subdata1<-lm(co2~temp, data=subdat)
> > Lmodel_subdata1#coef
> >
> > sub_dat2 <- strata.sampling(dat, 'L_group', 4)#
> > Lmodel_subdata2<-lm(co2~temp, data=subdat2)
> > Lmodel_subdata2#coef
> >
> > and so on.....[for 100 times)
> >
> > Table<-rbind(Lmodel_subdata1$coef, Lmodel_subdata1$coef, ....)
> >
> >
> >   		 	   		
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> > -----
> > No virus found in this message.
> > Checked by AVG - www.avg.com
> > Version: 2015.0.5645 / Virus Database: 4284/9131 - Release Date: 02/17/15
> >
> >
> >
> 
> -- 
> Michael
> http://www.dewey.myzen.co.uk
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
 		 	   		  
	[[alternative HTML version deleted]]



More information about the R-help mailing list