[R] Division of data set with some restriction

Jim Lemon drjimlemon at gmail.com
Tue Jan 26 02:28:58 CET 2016


Hi Muhammad,
There are a large number of approximate answers to your problem. One easy
one is to ensure that the standard deviations of the subgroups are
maximally different by dividing the observations into "mids" (observations
close to the mean) and "tails" (observations far from the mean). The
following function sorts the observations, takes half of the observations
from the "mids" and compares the mean of those observations to the mean of
the "tails". Depending upon whether the distribution is positively or
negatively skewed, it then shifts the "mids" up or down until the direction
of inequality is reversed. The two subsets of observations are returned in
a list. By printing the successive means, the user can see whether the
penultimate means were closer than the means of the subsets returned.
Perhaps it will be useful.

rw50<-rweibull(50,1)

split_on_mean<-function(x) {
 lenx<-length(x)
 sx<-sort(x)
 x4<-floor(lenx/4)
 x34<-floor(3*lenx/4)
 lowx<-1:x4
 midx<-(x4+1):x34
 highx<-(x34+1):lenx
 midmean<-mean(sx[midx])
 tailsmean<-mean(sx[c(lowx,highx)])
 if(midmean < tailsmean) {
  while(midmean < tailsmean) {
   lowx<-c(lowx,midx[1])
   midx<-c(midx[-1],highx[1])
   highx<-highx[-1]
   midmean<-mean(sx[midx])
   tailsmean<-mean(sx[c(lowx,highx)])
   cat(midmean,tailsmean,"\n")
  }
 } else {
  while(midmean > tailsmean) {
   highx<-c(highx,midx[length(midx)])
   midx<-c(midx[-length(midx)],lowx[length(lowx)])
   lowx<-lowx[-length(lowx)]
   midmean<-mean(sx[midx])
   tailsmean<-mean(sx[c(lowx,highx)])
  }
 }
 return(list(midx=sx[midx],tailsx=sx[c(lowx,highx)]))
}

Jim

On Tue, Jan 26, 2016 at 7:12 AM, Muhammad Kashif <mkashif at uaf.edu.pk> wrote:

>
> Dear Group members
>
> Can any one help to code this situation. Suppose we have a population with
> some mean  and a standard deviation. Then  , there are n1 observations out
> of n  which are less than or equal to n . Also, there are n2 observations
> out of n which are greater than  . We divide the whole data set into two
> parts such that we have the same mean   but different standard deviations.
>
> for example we have 50 observations from any distribution say two
> parameter Weibull. Then we divide the data into two parts such that the two
> resulting data sets have same mean and different standard deviation.
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list