[R] how to subset unique factor combinations from a data frame.
petr.pikal at precheza.cz
Tue Jan 4 12:05:39 CET 2011
r-help-bounces at r-project.org napsal dne 04.01.2011 11:19:02:
> Sorry that my example is not clear. I will give an example of what each
> variable holds. I hope this clearly explains the case.
> Names of the dataframe (df) and description
> Year :- Year is calendar year, from 1980 to 2010
> Country :- is the country name, total no. (levels) of countries is ~ 190
> Commodity :- Crude oil, Sugar, Rubber, Coffee .... No. (levels) of
> commodities is 20
> Attribute: - Production, Consumption, Stock, Import, Export... Levels ~
> Unit :- this is actually not a factor. It describes the unit of
> Say the unit for Coffee (commodity) - Production (attribute) is 60 kgs.
> While the unit for Crude oil - Production is 1000 barrels
> Value :- value
> > tail(df, n = 10) // example data//
> Year Country Commodity Attribute Unit
> 1991 United Kingdom Wheat, Durum Total Supply (1000 MT) 70
> 1991 United Kingdom Wheat, Durum TY Exports (1000 MT) 0
> 1991 United Kingdom Wheat, Durum TY Imp. from U (1000 MT) 0
> 1991 United Kingdom Wheat, Durum TY Imports (1000 MT) 60
> 1991 United Kingdom Wheat, Durum Yield (MT/HA) 5
> Wish this is clear. Any suggestion
suggestion is still the same, use aggregate on any other similar function
maybe from plyr package. No matter how exactly you will describe your data
if you fail to show any code you used and how this code failed in
delivering desired result you will get only vague responses.
> -----Original Message-----
> From: Petr PIKAL [mailto:petr.pikal at precheza.cz]
> Sent: Tuesday, January 04, 2011 4:06 PM
> To: SNV Krishna
> Cc: r-help at r-project.org
> Subject: Odp: [R] how to subset unique factor combinations from a data
> r-help-bounces at r-project.org napsal dne 04.01.2011 05:21:25:
> > Hi All
> > I have these questions and request members expert view on this.
> > a) I have a dataframe (df) with five factors (identity variables) and
> > (measured value). The id variables are Year, Country, Commodity,
> > Unit. Value is a value for each combination of this.
> > I would like to get just the unique combination of Commodity,
> > Attribute
> > Unit. I just need the unique factor combination into a dataframe or a
> > I know aggregate and subset but dont how to use them in this context.
> aggregate(Value, list(Comoditiy, Atribute, Unit), function)
> > b) Is it possible to inclue non- aggregate columns with aggregate
> > say in the above case > aggregate(Value ~ Commodity + Attribute, data
> > =
> > FUN = count). The use of count(Value) is just a round about to return
> > combinations of Commodity & Attribute, and I would like to include
> > column in the returned data frame?
> Hm. Maybe xtabs? But without any example it is only a guess.
> > c) Is it possible to subset based on unique combination, some thing
> > like this.
> > > subset(df, unique(Commodity), select = c(Commodity, Attribute,
> > know this is not correct as it returns an error 'subset needs a
> > logical evaluation'. Trying various ways to accomplish the task.
> Probably sqldf package has tools for doing it but I do not use it so you
> have to try yourself.
> df[Comodity==something, c("Commodity", "Attribute", "Unit")]
> can be other way.
> Anyway your explanation is ambiguous. Let say you have three rows with
> same Commodity. Which row do you want to select?
> > will be grateful for any ideas and help
> > Regards,
> > SNVK
> > [[alternative HTML version deleted]]
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > and provide commented, minimal, self-contained, reproducible code.
> R-help at r-project.org mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help