[R] how to subset unique factor combinations from a data frame.

Tue Jan 4 11:19:02 CET 2011

Hi,

Sorry that my example is not clear. I will give an example of what each
variable holds. I hope this clearly explains the case.

Names of the dataframe (df) and description

Year :- Year is calendar year, from 1980 to 2010

Country :- is the country name, total no. (levels) of countries is ~ 190 

Commodity :- Crude oil, Sugar, Rubber, Coffee .... No. (levels) of
commodities is 20

Attribute: - Production, Consumption, Stock, Import, Export... Levels ~ 20

Unit :- this is actually not a factor. It describes the unit of Attribute.
Say the unit for Coffee (commodity) - Production (attribute) is 60 kgs.
While the unit for Crude oil - Production is 1000 barrels

Value :-  value 

> tail(df, n = 10) // example data//

Year	Country		Commodity	Attribute	Unit
Value
1991	United Kingdom	Wheat, Durum	Total Supply	(1000 MT)	70
1991	United Kingdom	Wheat, Durum	TY Exports	(1000 MT)	0
1991	United Kingdom	Wheat, Durum	TY Imp. from U	(1000 MT)	0
1991	United Kingdom	Wheat, Durum	TY Imports	(1000 MT)	60
1991	United Kingdom	Wheat, Durum	Yield		(MT/HA)	5

Wish this is clear. Any suggestion

Regards,

SNVK

-----Original Message-----
From: Petr PIKAL [mailto:petr.pikal at precheza.cz] 
Sent: Tuesday, January 04, 2011 4:06 PM
To: SNV Krishna
Cc: r-help at r-project.org
Subject: Odp: [R] how to subset unique factor combinations from a data
frame.

Hi

r-help-bounces at r-project.org napsal dne 04.01.2011 05:21:25:

> Hi All
> 
> I have these questions and request members expert view on this. 
> 
> a) I have a dataframe (df) with five factors (identity variables) and
value
> (measured value). The id variables are Year, Country, Commodity,
Attribute,
> Unit. Value is a value for each combination of this.
> 
> I would like to get just the unique combination of Commodity, 
> Attribute
and
> Unit. I just need the unique factor combination into a dataframe or a
table.
> I know aggregate and subset but dont how to use them in this context. 

aggregate(Value, list(Comoditiy, Atribute, Unit), function)

> 
> b) Is it possible to inclue non- aggregate columns with aggregate
function
> 
> say in the above case > aggregate(Value ~ Commodity + Attribute, data 
> =
df,
> FUN = count). The use of count(Value) is just a round about to return
the
> combinations of Commodity & Attribute, and I would like to include
'Unit'
> column in the returned data frame?

Hm. Maybe xtabs? But without any example it is only a guess.

> 
> c) Is it possible to subset based on unique combination, some thing 
> like this.
> 
> > subset(df, unique(Commodity), select = c(Commodity, Attribute, Unit)). 
I
> know this is not correct as it returns an error 'subset needs a 
> logical evaluation'. Trying various ways to accomplish the task.
> 

Probably sqldf package has tools for doing it but I do not use it so you
have to try yourself.

df[Comodity==something, c("Commodity", "Attribute", "Unit")]

can be other way.

Anyway your explanation is ambiguous. Let say you have three rows with the
same Commodity. Which row do you want to select?

Regards
Petr

> will be grateful for any ideas and help
> 
> Regards,
> 
> SNVK
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.