[R] aggregating along bins and bin-quantiles
Ivan Alves
papucho at mac.com
Tue Oct 21 09:25:36 CEST 2008
Dear all,
Thanks to Jim and Mark for suggesting including the reproducible
code. Please note that the enclosed file would need to go to into the
home folder or that the path for reading the CSV file be changed. I
hope no encoding issues emerge when reading it.
And the code
library(Hmisc) #need the cut2 function to mark the quantile a given
line belongs to
a <- read.csv(file = "~/example.csv", colClasses=c("Date","numeric"))
#beware of the path
dim(a) #should give "[1] 5076 2"
aggregate(a$value, list(Date = a[,"Date"],Quantile=cut2(a
$value,g=10)),sum) #should give the sum by year but on the quantiles
for the whole population
aggregate(a$value, list(Date = a[,"Date"],Quantile=tapply(a
$value,use.filter$Date,cut2,g=10)),sum) #gives error mentioned below
Once again, many thanks for any help
Ivan
On 21 Oct 2008, at 02:40, jim holtman wrote:
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> You need to at least post a subset of your data so that we can
> understand the data structures that you are using. 'dput' will create
> an easily readable format for posting your data (much easier than if
> you post the listing of a table). Usually it is some 'type mismatch'
> which says you really have to have the data to run the script against.
>
> On Mon, Oct 20, 2008 at 6:38 PM, Ivan Alves <papucho at mac.com> wrote:
>> Dear all,
>>
>> I would like to aggregate a data frame (consisting of 2 columns - one
>> for the bins, say factors, and one for the values) along bins and
>> quantiles within the bins.
>>
>> I have tried
>>
>> aggregate(data.frame$values, list(bin = data.frame
>> $bin,Quantile=cut2(data.frame$bin,g=10)),sum)
>>
>> but then the quantiles apply to the population as a whole and not the
>> individual bins. Upon this realisation I have tried
>>
>> aggregate(data.frame$values, list(bin = data.frame
>> $bin,Quantile=tapply(data.frame$values,data.frame
>> $bin,cut2,g=10)),sum)
>>
>> which gives the following error:
>>
>> Error in sort.list(unique.default(x), na.last = TRUE) :
>> 'x' must be atomic for 'sort.list'
>> Have you called 'sort' on a list?
>>
>> clearly I am doing something wrong, but cannot figure out what. I
>> believe the error stems either from a. the output of tapply being a
>> list of a dimension equal to the number of bins, and not a list of
>> equal dimension as the values, or b. that somehow aggregate does not
>> like that the second list (of the quantiles within the bins are not
>> sorted nicely)
>>
>> 1. Do you have a reference for doing the summation on both bins and
>> quantiles within the bins?
>> 2. If not, can you give me some guidance as to what I am doing wrong
>> and how I can solve the sort/list issue?
>>
>> Any help would be greatly appreciated
>>
>> Kind regards,
>>
>> Ivan Alves
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
More information about the R-help
mailing list