[R] subsetting a data set

Petr Pikal petr.pikal at precheza.cz
Fri Sep 8 13:19:52 CEST 2006


Hi

On 8 Sep 2006 at 10:33, Graham Smith wrote:

Date sent:      	Fri, 8 Sep 2006 10:33:49 +0100
From:           	"Graham Smith" <myotisone at gmail.com>
To:             	"Petr Pikal" <petr.pikal at precheza.cz>
Copies to:      	r-help at stat.math.ethz.ch
Subject:        	Re: [R] subsetting a data set

> Petr,
> 
> Thanks again, but the data is GQ1, Max is a variable (column)
> 
> So I have used
> 
>  by(GQ1[,"Max"], list(GQ1$Status), summary)
> 
> Which is very good,  and is better than the way I did it before by
> summarising for each status level individually, but that still isn't
> combing the data for Status == "Expert" and Status = "Ecol"
> 
> So at the moment the status variable has 3 levels Expert, Ecol and
> Stake,

look at ?factors how to deal with factors, if your variable is not a 
factor (see ?str) than turn it to one.

x<-sample(letters[1:3], 20, replace=T) #character
x.f<-as.factor(x) #turn to factor
> x.f
 [1] b c b a c a c a a a a a b c c c b b c b
Levels: a b c
> levels(x.f)<-c("x","x","y") #rename levels
> x.f
 [1] x y x x y x y x x x x x x y y y x x y x
Levels: x y
>
> 
> I want to analsye that at two levels: Expert and Ecol combined into a
> new level called "AllEcol" and the exsiting level "Stake"

so in your case something like 

GQ1$statusComb<-factor(GQ1$status, labels=c("AllEcol","AllEcol", 
"Stake"))

shall do it. Beware of label ordering!!!

BTW. It had been good if you provided a usable example, as stated in 
posting guide. Many times trying to elaborate an example I will solve 
the problem myself.

HTH
Petr

> 
> It is this combining the levels that has got me stuck.
> 
> Thanks again,
> 
> Graham
> 
> On 08/09/06, Petr Pikal <petr.pikal at precheza.cz> wrote:
> >
> > Sorry, I did not notice that in your case Max is not a function but
> > your data. So probably
> >
> > by(Max[, your.columns], list(Max$status), summary)
> >
> > is maybe what you want.
> > HTH
> > Petr
> >
> >
> > On 8 Sep 2006 at 10:31, Petr Pikal wrote:
> >
> > From:                   "Petr Pikal" <petr.pikal at precheza.cz>
> > To:                     "Graham Smith" <myotisone at gmail.com>,
> > r-help at stat.math.ethz.ch
> > Date sent:              Fri, 08 Sep 2006 10:31:12 +0200
> > Priority:               normal
> > Subject:                Re: [R] subsetting a data set
> >
> > > Hi
> > >
> > > I am not sure if your Max is the same as max so I am not sure what
> > > you exactly want from your data. However you shall consult
> > > ?tapply, ?by, ?aggregate and maybe also ?"[" together with chapter
> > > 2 in intro manual in docs directory.
> > >
> > > aggregate(data[, some.columns], list(data$factor1, data$factor2),
> > > max)
> > >
> > > will give you maximum for specified columns based on spliting the
> > > data according to both factors
> > >
> > > Also connection summary with max is not common and I wonder what
> > > is your output in this case. I believe that there are six same
> > > numbers. However R is case sensitive and maybe Max does something
> > > different from max. In my case it throws an error.
> > >
> > > HTH
> > > Petr
> > >
> > > On 8 Sep 2006 at 8:06, Graham Smith wrote:
> > >
> > > Date sent:            Fri, 8 Sep 2006 08:06:16 +0100
> > > From:                 "Graham Smith" <myotisone at gmail.com>
> > > To:                   r-help at stat.math.ethz.ch
> > > Subject:              [R] subsetting a data set
> > >
> > > > I have a data set called GQ1, which has 20 variables one of
> > > > which is a factor called Status at thre levels "Expert", "Ecol"
> > > > and "Stake"
> > > >
> > > > I have managed to evaluate some of the data split by status
> > > > using commands like:
> > > >
> > > > summary (Max[Status=="Ecol"])
> > > >
> > > > BUT how do I produce  asummary for Ecol and Expert combined, the
> > > > only example I can find suggsts I could use
> > > >
> > > > summary (Max[Status=="Ecol"& Status=="Expert"]) but that doesn't
> > > > work.
> > > >
> > > > Additionally on the same vein, if I cannot work out how to
> > > > create a new data set that would contain all the data for all
> > > > the variables but only for the data where Status = Ecol, or
> > > > where status equalles Ecol and Expert.
> > > >
> > > > I know this is yet again a very simple problem, but I really
> > > > can't find the solution in the help or the books I have.
> > > >
> > > > Many thanks,
> > > >
> > > > Graham
> > > >
> > > >  [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-help at stat.math.ethz.ch mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html and provide
> > > > commented, minimal, self-contained, reproducible code.
> > >
> > > Petr Pikal
> > > petr.pikal at precheza.cz
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html and provide commented,
> > > minimal, self-contained, reproducible code.
> >
> > Petr Pikal
> > petr.pikal at precheza.cz
> >
> >
> 
>  [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

Petr Pikal
petr.pikal at precheza.cz



More information about the R-help mailing list