[R] Surprising Behavior of 'tapply'
Rau, Roland
Rau at demogr.mpg.de
Fri Feb 4 10:58:43 CET 2005
Dear helpers,
thank you very much for your advice.
After starting a new R-session this morning, I was also unable to replicate the problem, although the old session showed still the same problem.
One suggestion was that I maybe redefined some functions, but this was not the case. I only loaded one additional package (Hmisc) but I did this now as well and it did not cause any problems.
Another suggestion (alternative) was to use 'xtabs'. This works also nicely, but I made some timings with my dataset (moderate size of 6MB) and I assume that for really large datasets 'tapply' is probably faster than 'xtabs':
> system.time(tapply(austria$COUNT, list(austria$sescat, austria$STATUS, austria$SEX), sum))
[1] 0.05 0.00 0.04 NA NA
> system.time(xtabs(austria$COUNT ~., data.frame(ses = austria$sescat, status =austria$STATUS, sex=austria$SEX)))
[1] 0.86 0.00 0.86 NA NA
>
(I did the timings several times and was also using gc() ).
Thanks again (in chronological order) to Bert Gunter, Carlos Ortega, James Holtman, and Gabor Grothendieck.
Best,
Roland
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Gabor Grothendieck
Sent: Thursday, February 03, 2005 9:08 PM
To: r-help at stat.math.ethz.ch
Subject: Re: [R] Surprising Behavior of 'tapply'
I tried it on Windows XP with R 2.1.0 and could not replicate it either.
Suggest you start up a fresh session and try it again.
By the way, you could consider this:
xtabs(count ~., data.frame(sex = sex, income = income))
Carlos Ortega <carlos_ortegafernandez <at> yahoo.es> writes:
:
: Hi,
:
: That is something strange, I could not replicate it...
:
: Regards,
: Carlos.
:
: +++++++++++++++++++++++++++++++++++
: > version
: _
: platform i386-pc-mingw32
: arch i386
: os mingw32
: system i386, mingw32
: status
: major 2
: minor 0.1
: year 2004
: month 11
: day 15
: language R
: > sex <- rep(c("F", "M"), 5)
: > income <- c(rep("low", 5), rep("high", 5))
: > count <- 1:10
: > mydf <- as.data.frame(cbind(sex, income, count))
: > mydf$count = as.numeric(as.character(mydf$count))
: > tapply(mydf$count, list(mydf$sex, mydf$income),
: FUN=sum)
: high low
: F 16 9
: M 24 6
: ++++++++++++++++++++++++++++++++++++++++++++
:
: --- "Rau, Roland" <Rau <at> demogr.mpg.de> escribió:
: > Dear all,
: >
: > I wanted to make a two-way-table of two variables
: > with a counting
: > variable stored in another column of a dataframe. In
: > version 1.9.1, the
: > behavior is as expected as shown in the simplified
: > example code.
: >
: > > sex <- rep(c("F", "M"), 5)
: > > income <- c(rep("low", 5), rep("high", 5))
: > > count <- 1:10
: > > mydf <- as.data.frame(cbind(sex, income, count))
: > > mydf$count = as.numeric(as.character(mydf$count))
: > > tapply(mydf$count, list(mydf$sex, mydf$income),
: > FUN=sum)
: > high low
: > F 16 9
: > M 24 6
: > > version
: > _
: > platform i386-pc-mingw32
: > arch i386
: > os mingw32
: > system i386, mingw32
: > status
: > major 1
: > minor 9.1
: > year 2004
: > month 06
: > day 21
: > language R
: > >
: >
: > In version 2.0.1, however, I get the following
: > output:
: >
: > > sex <- rep(c("F", "M"), 5)
: > > income <- c(rep("low", 5), rep("high", 5))
: > > count <- 1:10
: > > mydf <- as.data.frame(cbind(sex, income, count))
: > > mydf$count = as.numeric(as.character(mydf$count))
: > > tapply(mydf$count, list(mydf$sex, mydf$income),
: > FUN=sum)
: > Error in get(x, envir, mode, inherits) : variable
: > "FUN" was not found
: > > version
: > _
: > platform i386-pc-mingw32
: > arch i386
: > os mingw32
: > system i386, mingw32
: > status
: > major 2
: > minor 0.1
: > year 2004
: > month 11
: > day 15
: > language R
: > >
: >
: > Was this change in behavior intended with the
: > changes in tapply from
: > R1.9.1 to R2.0.1?
: > Is the R-help-list appropriate or rather R-Devel?
: >
: > Thanks,
: > Roland
: >
: >
: >
: > +++++
: > This mail has been sent through the MPI for
: > Demographic Rese...{{dropped}}
: >
: > ______________________________________________
: > R-help <at> stat.math.ethz.ch mailing list
: > https://stat.ethz.ch/mailman/listinfo/r-help
: > PLEASE do read the posting guide!
: > http://www.R-project.org/posting-guide.html
: >
:
: ______________________________________________
: R-help <at> stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
:
:
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
+++++
This mail has been sent through the MPI for Demographic Rese...{{dropped}}
More information about the R-help
mailing list