[R] how to subset unique factor combinations from a data frame.

Petr PIKAL petr.pikal at precheza.cz
Wed Jan 5 12:04:30 CET 2011


Hi

You probably did not notice xtabs I mentioned before.

as.data.frame(xtabs(~x+xx))

> u <- as.data.frame(table(x, xx))
> head(u)
  x xx Freq
1 A  a   18
2 B  a   27
3 C  a   30
4 D  a   30
5 E  a   27
6 F  a   18
> 
> v<-as.data.frame(xtabs(~x+xx))

> head(v)
  x xx Freq
1 A  a   18
2 B  a   27
3 C  a   30
4 D  a   30
5 E  a   27
6 F  a   18

Regards
Petr


r-help-bounces at r-project.org napsal dne 05.01.2011 08:46:21:

> Hi Dennis,
> 
> It worked! this is what I am looking for. Many thanks.
> 
> Rgds, 
> 
> SNVK
>   _____ 
> 
> From: Dennis Murphy [mailto:djmuser at gmail.com] 
> Sent: Tuesday, January 04, 2011 9:07 PM
> To: SNV Krishna
> Cc: r-help at r-project.org
> Subject: Re: [R] how to subset unique factor combinations from a data 
frame.
> 
> 
> Hi:
> 
> Did you try something like
> 
> summdf <- as.data.frame(with(df, table(Commodity, Attribute, Unit)))
> 
> 
> ? 
> The rows of the table should represent the unique combinations of the 
three
> variables....
> 
> Here's a simple toy example to illustrate:
> > x <- sample(LETTERS[1:6], 1000, replace = TRUE)
> > xx <- sample(letters[1:6], 1000, replace = TRUE)
> > u <- as.data.frame(table(x, xx))
> > dim(u)
> [1] 36  3
> > head(u)
>   x xx Freq
> 1 A  a   26
> 2 B  a   29
> 3 C  a   25
> 4 D  a   25
> 5 E  a   27
> 6 F  a   29
> 
> HTH,
> Dennis
> 
> 
> On Tue, Jan 4, 2011 at 2:19 AM, SNV Krishna <krishna at primps.com.sg> 
wrote:
> 
> 
> Hi,
> 
> Sorry that my example is not clear. I will give an example of what each
> variable holds. I hope this clearly explains the case.
> 
> Names of the dataframe (df) and description
> 
> Year :- Year is calendar year, from 1980 to 2010
> 
> Country :- is the country name, total no. (levels) of countries is ~ 190
> 
> Commodity :- Crude oil, Sugar, Rubber, Coffee .... No. (levels) of
> commodities is 20
> 
> Attribute: - Production, Consumption, Stock, Import, Export... Levels ~ 
20
> 
> Unit :- this is actually not a factor. It describes the unit of 
Attribute.
> Say the unit for Coffee (commodity) - Production (attribute) is 60 kgs.
> While the unit for Crude oil - Production is 1000 barrels
> 
> Value :-  value
> 
> > tail(df, n = 10) // example data//
> 
> Year    Country         Commodity       Attribute       Unit
> Value
> 1991    United Kingdom  Wheat, Durum    Total Supply    (1000 MT) 70
> 1991    United Kingdom  Wheat, Durum    TY Exports      (1000 MT) 0
> 1991    United Kingdom  Wheat, Durum    TY Imp. from U  (1000 MT) 0
> 1991    United Kingdom  Wheat, Durum    TY Imports      (1000 MT) 60
> 1991    United Kingdom  Wheat, Durum    Yield           (MT/HA) 5
> 
> Wish this is clear. Any suggestion
> 
> Regards,
> 
> SNVK
> 
> -----Original Message-----
> From: Petr PIKAL [mailto:petr.pikal at precheza.cz]
> Sent: Tuesday, January 04, 2011 4:06 PM
> To: SNV Krishna
> Cc: r-help at r-project.org
> Subject: Odp: [R] how to subset unique factor combinations from a data
> frame.
> 
> Hi
> 
> r-help-bounces at r-project.org napsal dne 04.01.2011 05:21:25:
> 
> > Hi All
> >
> > I have these questions and request members expert view on this.
> >
> > a) I have a dataframe (df) with five factors (identity variables) and
> value
> > (measured value). The id variables are Year, Country, Commodity,
> Attribute,
> > Unit. Value is a value for each combination of this.
> >
> > I would like to get just the unique combination of Commodity,
> > Attribute
> and
> > Unit. I just need the unique factor combination into a dataframe or a
> table.
> > I know aggregate and subset but dont how to use them in this context.
> 
> aggregate(Value, list(Comoditiy, Atribute, Unit), function)
> 
> >
> > b) Is it possible to inclue non- aggregate columns with aggregate
> function
> >
> > say in the above case > aggregate(Value ~ Commodity + Attribute, data
> > =
> df,
> > FUN = count). The use of count(Value) is just a round about to return
> the
> > combinations of Commodity & Attribute, and I would like to include
> 'Unit'
> > column in the returned data frame?
> 
> Hm. Maybe xtabs? But without any example it is only a guess.
> 
> >
> > c) Is it possible to subset based on unique combination, some thing
> > like this.
> >
> > > subset(df, unique(Commodity), select = c(Commodity, Attribute, 
Unit)).
> I
> > know this is not correct as it returns an error 'subset needs a
> > logical evaluation'. Trying various ways to accomplish the task.
> >
> 
> Probably sqldf package has tools for doing it but I do not use it so you
> have to try yourself.
> 
> df[Comodity==something, c("Commodity", "Attribute", "Unit")]
> 
> can be other way.
> 
> Anyway your explanation is ambiguous. Let say you have three rows with 
the
> same Commodity. Which row do you want to select?
> 
> Regards
> Petr
> 
> 
> > will be grateful for any ideas and help
> >
> > Regards,
> >
> > SNVK
> >
> >    [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list