[R] how to subset unique factor combinations from a data frame.
krishna at primps.com.sg
Tue Jan 4 11:19:02 CET 2011
Sorry that my example is not clear. I will give an example of what each
variable holds. I hope this clearly explains the case.
Names of the dataframe (df) and description
Year :- Year is calendar year, from 1980 to 2010
Country :- is the country name, total no. (levels) of countries is ~ 190
Commodity :- Crude oil, Sugar, Rubber, Coffee .... No. (levels) of
commodities is 20
Attribute: - Production, Consumption, Stock, Import, Export... Levels ~ 20
Unit :- this is actually not a factor. It describes the unit of Attribute.
Say the unit for Coffee (commodity) - Production (attribute) is 60 kgs.
While the unit for Crude oil - Production is 1000 barrels
Value :- value
> tail(df, n = 10) // example data//
Year Country Commodity Attribute Unit
1991 United Kingdom Wheat, Durum Total Supply (1000 MT) 70
1991 United Kingdom Wheat, Durum TY Exports (1000 MT) 0
1991 United Kingdom Wheat, Durum TY Imp. from U (1000 MT) 0
1991 United Kingdom Wheat, Durum TY Imports (1000 MT) 60
1991 United Kingdom Wheat, Durum Yield (MT/HA) 5
Wish this is clear. Any suggestion
From: Petr PIKAL [mailto:petr.pikal at precheza.cz]
Sent: Tuesday, January 04, 2011 4:06 PM
To: SNV Krishna
Cc: r-help at r-project.org
Subject: Odp: [R] how to subset unique factor combinations from a data
r-help-bounces at r-project.org napsal dne 04.01.2011 05:21:25:
> Hi All
> I have these questions and request members expert view on this.
> a) I have a dataframe (df) with five factors (identity variables) and
> (measured value). The id variables are Year, Country, Commodity,
> Unit. Value is a value for each combination of this.
> I would like to get just the unique combination of Commodity,
> Unit. I just need the unique factor combination into a dataframe or a
> I know aggregate and subset but dont how to use them in this context.
aggregate(Value, list(Comoditiy, Atribute, Unit), function)
> b) Is it possible to inclue non- aggregate columns with aggregate
> say in the above case > aggregate(Value ~ Commodity + Attribute, data
> FUN = count). The use of count(Value) is just a round about to return
> combinations of Commodity & Attribute, and I would like to include
> column in the returned data frame?
Hm. Maybe xtabs? But without any example it is only a guess.
> c) Is it possible to subset based on unique combination, some thing
> like this.
> > subset(df, unique(Commodity), select = c(Commodity, Attribute, Unit)).
> know this is not correct as it returns an error 'subset needs a
> logical evaluation'. Trying various ways to accomplish the task.
Probably sqldf package has tools for doing it but I do not use it so you
have to try yourself.
df[Comodity==something, c("Commodity", "Attribute", "Unit")]
can be other way.
Anyway your explanation is ambiguous. Let say you have three rows with the
same Commodity. Which row do you want to select?
> will be grateful for any ideas and help
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help