[R] how to subset unique factor combinations from a data frame.
Petr PIKAL
petr.pikal at precheza.cz
Wed Jan 5 12:04:30 CET 2011
Hi
You probably did not notice xtabs I mentioned before.
as.data.frame(xtabs(~x+xx))
> u <- as.data.frame(table(x, xx))
> head(u)
x xx Freq
1 A a 18
2 B a 27
3 C a 30
4 D a 30
5 E a 27
6 F a 18
>
> v<-as.data.frame(xtabs(~x+xx))
> head(v)
x xx Freq
1 A a 18
2 B a 27
3 C a 30
4 D a 30
5 E a 27
6 F a 18
Regards
Petr
r-help-bounces at r-project.org napsal dne 05.01.2011 08:46:21:
> Hi Dennis,
>
> It worked! this is what I am looking for. Many thanks.
>
> Rgds,
>
> SNVK
> _____
>
> From: Dennis Murphy [mailto:djmuser at gmail.com]
> Sent: Tuesday, January 04, 2011 9:07 PM
> To: SNV Krishna
> Cc: r-help at r-project.org
> Subject: Re: [R] how to subset unique factor combinations from a data
frame.
>
>
> Hi:
>
> Did you try something like
>
> summdf <- as.data.frame(with(df, table(Commodity, Attribute, Unit)))
>
>
> ?
> The rows of the table should represent the unique combinations of the
three
> variables....
>
> Here's a simple toy example to illustrate:
> > x <- sample(LETTERS[1:6], 1000, replace = TRUE)
> > xx <- sample(letters[1:6], 1000, replace = TRUE)
> > u <- as.data.frame(table(x, xx))
> > dim(u)
> [1] 36 3
> > head(u)
> x xx Freq
> 1 A a 26
> 2 B a 29
> 3 C a 25
> 4 D a 25
> 5 E a 27
> 6 F a 29
>
> HTH,
> Dennis
>
>
> On Tue, Jan 4, 2011 at 2:19 AM, SNV Krishna <krishna at primps.com.sg>
wrote:
>
>
> Hi,
>
> Sorry that my example is not clear. I will give an example of what each
> variable holds. I hope this clearly explains the case.
>
> Names of the dataframe (df) and description
>
> Year :- Year is calendar year, from 1980 to 2010
>
> Country :- is the country name, total no. (levels) of countries is ~ 190
>
> Commodity :- Crude oil, Sugar, Rubber, Coffee .... No. (levels) of
> commodities is 20
>
> Attribute: - Production, Consumption, Stock, Import, Export... Levels ~
20
>
> Unit :- this is actually not a factor. It describes the unit of
Attribute.
> Say the unit for Coffee (commodity) - Production (attribute) is 60 kgs.
> While the unit for Crude oil - Production is 1000 barrels
>
> Value :- value
>
> > tail(df, n = 10) // example data//
>
> Year Country Commodity Attribute Unit
> Value
> 1991 United Kingdom Wheat, Durum Total Supply (1000 MT) 70
> 1991 United Kingdom Wheat, Durum TY Exports (1000 MT) 0
> 1991 United Kingdom Wheat, Durum TY Imp. from U (1000 MT) 0
> 1991 United Kingdom Wheat, Durum TY Imports (1000 MT) 60
> 1991 United Kingdom Wheat, Durum Yield (MT/HA) 5
>
> Wish this is clear. Any suggestion
>
> Regards,
>
> SNVK
>
> -----Original Message-----
> From: Petr PIKAL [mailto:petr.pikal at precheza.cz]
> Sent: Tuesday, January 04, 2011 4:06 PM
> To: SNV Krishna
> Cc: r-help at r-project.org
> Subject: Odp: [R] how to subset unique factor combinations from a data
> frame.
>
> Hi
>
> r-help-bounces at r-project.org napsal dne 04.01.2011 05:21:25:
>
> > Hi All
> >
> > I have these questions and request members expert view on this.
> >
> > a) I have a dataframe (df) with five factors (identity variables) and
> value
> > (measured value). The id variables are Year, Country, Commodity,
> Attribute,
> > Unit. Value is a value for each combination of this.
> >
> > I would like to get just the unique combination of Commodity,
> > Attribute
> and
> > Unit. I just need the unique factor combination into a dataframe or a
> table.
> > I know aggregate and subset but dont how to use them in this context.
>
> aggregate(Value, list(Comoditiy, Atribute, Unit), function)
>
> >
> > b) Is it possible to inclue non- aggregate columns with aggregate
> function
> >
> > say in the above case > aggregate(Value ~ Commodity + Attribute, data
> > =
> df,
> > FUN = count). The use of count(Value) is just a round about to return
> the
> > combinations of Commodity & Attribute, and I would like to include
> 'Unit'
> > column in the returned data frame?
>
> Hm. Maybe xtabs? But without any example it is only a guess.
>
> >
> > c) Is it possible to subset based on unique combination, some thing
> > like this.
> >
> > > subset(df, unique(Commodity), select = c(Commodity, Attribute,
Unit)).
> I
> > know this is not correct as it returns an error 'subset needs a
> > logical evaluation'. Trying various ways to accomplish the task.
> >
>
> Probably sqldf package has tools for doing it but I do not use it so you
> have to try yourself.
>
> df[Comodity==something, c("Commodity", "Attribute", "Unit")]
>
> can be other way.
>
> Anyway your explanation is ambiguous. Let say you have three rows with
the
> same Commodity. Which row do you want to select?
>
> Regards
> Petr
>
>
> > will be grateful for any ideas and help
> >
> > Regards,
> >
> > SNVK
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list