[R] categorical data

Petr Pikal petr.pikal at precheza.cz
Thu Aug 10 12:11:39 CEST 2006


Hi

On 10 Aug 2006 at 9:19, Christian Oswald wrote:

Date sent:      	Thu, 10 Aug 2006 09:19:06 +0200
From:           	Christian Oswald <oswald at dhlaw.de>
To:             	r-help at stat.math.ethz.ch
Subject:        	Re: [R] categorical data
Send reply to:  	oswald at dhlaw.de
	<mailto:r-help-request at stat.math.ethz.ch?subject=unsubscribe>
	<mailto:r-help-request at stat.math.ethz.ch?subject=subscribe>

> Hello,
> 
> thats what I need, a list sorted first after year and then after
> categorie. But I get an error message
> 
> > df
>       df     cate b    c
>  [1,] "2006" "a1" "1"  "1"
>  [2,] "2006" "a2" "2"  "2"
>  [3,] "2005" "a1" "3"  "3"
>  [4,] "2004" "a3" "1"  "1"
>  [5,] "2004" "a2" "2"  "2"
>  [6,] "2005" "a1" "3"  "3"
>  [7,] "2003" "a2" "11" "11"
>  [8,] "2003" "a1" "2"  "2"
>  [9,] "2006" "a2" "3"  "3"

This is not a data frame but character matrix
try str(df). It was probably constructed by cbind(...), try to use 
data.frame(....) instead.

Or you can try

as.data.frame(df) but then you need to change resulting factors back 
to numeric
?as.character
?as.numeric

HTH
Petr


try

> > res<-aggregate( df[,c(3,4)], list(df$year,df$cate), sum)
> Fehler in as.vector(x, mode) : Argument hat ungültigen 'mode'
> 
> 
> (Error in as.vector(x,mode) :Argument has invalid mode)
> 
> I had tested the mode and receive "character". Can someone explain
> what thats mean?
> 
> Christian
> 
> 
> 
> On Wed, 2006-08-09 at 18:07 +0200, Christian Oswald wrote:
> > > Dear List,
> > >
> > > I neeed a grouped list with two sort of categorical data. I have a
> > > data .frame like this. 	year	cat.	b	c 1	2006	a1	125	212
> > > 2	2006	a2	256	212	 3	2005	a1	14	12 4	2004	a3	565	123
> > > 5	2004	a2	156	789	 6	2005	a1	1	456 7	2003	a2	786	123
> > > 8	2003	a1	421	569 9  	2002	a2	425	245
> > >
> > > I need a list with the sum of b and c for every year and every cat
> > > (a1, a2 or a3) in this year. I had used the tapply function to
> > > build the sum for every year or every cat. How can I combine the
> > > two grouping values?
> 
> Christian,
> 
> Is that what you want (using DF as your data.frame):
> 
> > > aggregate(DF[, c("b", "c")],
>             by = list(Year = DF$year, Cat = DF$cat.),
>             sum)
>   Year Cat   b   c
> 1 2003  a1 421 569
> 2 2005  a1  15 468
> 3 2006  a1 125 212
> 4 2002  a2 425 245
> 5 2003  a2 786 123
> 6 2004  a2 156 789
> 7 2006  a2 256 212
> 8 2004  a3 565 123
> 
> You can also reorder the results by Year and Cat:
> 
> > > DF.result <- aggregate(DF[, c("b", "c")],
>                          by = list(Year = DFyear, Cat = DF$cat.), sum)
> 
> > > DF.result[order(DF.result$Year, DF.result$Cat), ]
>   Year Cat   b   c
> 4 2002  a2 425 245
> 1 2003  a1 421 569
> 5 2003  a2 786 123
> 6 2004  a2 156 789
> 8 2004  a3 565 123
> 2 2005  a1  15 468
> 3 2006  a1 125 212
> 7 2006  a2 256 212
> 
> 
> 
> Note that tapply() can only handle one 'X' vector at a time, whereas
> aggregate can handle multiple 'X' columns in one call. For example:
> 
> > > tapply(DF$b, list(DF$year, DF$cat.), sum)
>       a1  a2  a3
> 2002  NA 425  NA
> 2003 421 786  NA
> 2004  NA 156 565
> 2005  15  NA  NA
> 2006 125 256  NA
> 
> will give you the sum of 'b' for each combination of Year and Cat
> within the 2d table, but I suspect this is not the output format you
> want. You also get NA's in the cells where there was not the given
> combination present in your data.
> 
> HTH,
> 
> Marc Schwartz
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.
> 
> 
> 
> aw.de
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

Petr Pikal
petr.pikal at precheza.cz



More information about the R-help mailing list