[R] R 3.1.2 using a custom function in aggregate() function on Windows 7 OS 64bit

Bert Gunter gunter.berton at gene.com
Thu Mar 5 20:19:53 CET 2015


Well, I obviously don't use it either, as I'm just quoting the docs.

I either use by(), or tapply().

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Thu, Mar 5, 2015 at 10:47 AM, Jeff Newmiller
<jdnewmil at dcn.davis.ca.us> wrote:
> Bert: using the sample data frame from below, try to interpret the output of this:
>
> aggregate( dat[,1:2], dat[,"g",drop=FALSE, FUN=function(x){print(x);class(x)})
>
> The help text you quote is probably not as clear as it should be. Would the following be better?
>
> "... and FUN is applied to each column in each such subset with further arguments in ... passed to it."
>
> I became aware of this "feature" because this application of exactly the same aggregation function to all of my data columns is not convenient for my day-to-day work. Thus, I don't use "aggregate" very often.
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> On March 5, 2015 8:59:55 AM PST, Bert Gunter <gunter.berton at gene.com> wrote:
>>That's not what ?aggregate says:
>>
>>"aggregate.data.frame is the data frame method. If x is not a data
>>frame, it is coerced to one, which must have a non-zero number of
>>rows. Then, each of the variables (columns) in x is split into subsets
>>of cases (rows) of identical combinations of the components of by, and
>>FUN is applied to each such subset with further arguments in ...
>>passed to it."
>>
>>
>>As I read this, the argument of FUN is a data frame that is a subset
>>of the original frame, defined by the by variable values.
>>
>>
>>No?
>>
>>
>>-- Bert
>>
>>Bert Gunter
>>Genentech Nonclinical Biostatistics
>>(650) 467-7374
>>
>>"Data is not information. Information is not knowledge. And knowledge
>>is certainly not wisdom."
>>Clifford Stoll
>>
>>
>>
>>
>>On Thu, Mar 5, 2015 at 8:55 AM, Jeff Newmiller
>><jdnewmil at dcn.davis.ca.us> wrote:
>>> I don't see your point. No matter which version of aggregate you use,
>>FUN is applied to vectors. Those vectors may be columns in a data frame
>>or not, but FUN is always given one vector at a time by aggregate.
>>>
>>---------------------------------------------------------------------------
>>> Jeff Newmiller                        The     .....       .....  Go
>>Live...
>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>>Go...
>>>                                       Live:   OO#.. Dead: OO#..
>>Playing
>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>rocks...1k
>>>
>>---------------------------------------------------------------------------
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> On March 5, 2015 8:12:39 AM PST, Bert Gunter <gunter.berton at gene.com>
>>wrote:
>>>>Sorry, Jeff. aggregate() is generic.
>>>>
>>>>>From ?aggregate:
>>>>
>>>>"## S3 method for class 'data.frame'
>>>>aggregate(x, by, FUN, ..., simplify = TRUE)"
>>>>
>>>>Cheers,
>>>>Bert
>>>>
>>>>Bert Gunter
>>>>Genentech Nonclinical Biostatistics
>>>>(650) 467-7374
>>>>
>>>>"Data is not information. Information is not knowledge. And knowledge
>>>>is certainly not wisdom."
>>>>Clifford Stoll
>>>>
>>>>
>>>>
>>>>
>>>>On Thu, Mar 5, 2015 at 7:54 AM, Jeff Newmiller
>>>><jdnewmil at dcn.davis.ca.us> wrote:
>>>>> The aggregate function applies FUN to vectors, not data frames. For
>>>>example, the default "mean" function accepts a vector such as a
>>column
>>>>in a data frame and returns a scalar (well, a vector of length 1).
>>>>Aggregate then calls this function once for each piece of the
>>column(s)
>>>>you give it. Your function wants two vectors, but aggregate does not
>>>>understand how to give two inputs.
>>>>>
>>>>> (In the future, please follow R-help mailing list guidelines and
>>post
>>>>using plain text so your code does not get messed up.)
>>>>>
>>>>> You could use split to break your data frame into a list of data
>>>>frames, and then sapply to extract the results you are looking for. I
>>>>prefer to use the plyr or dplyr or data.table packages to do all this
>>>>for me.
>>>>>
>>>>> d_rule <- function( DF ) {
>>>>>   i <- which( DF$a==max( DF$a ) )
>>>>>   if ( length( i ) == 1 ){
>>>>>     DF[ i, "x" ]
>>>>>   } else {
>>>>>     min( DF[ , "x" ] ) # did you mean min( DF$x[i] ) ?
>>>>>   }
>>>>> }
>>>>>
>>>>> dat <- data.frame( a=c(2,2,1,4,2,5,2,3,4,4)
>>>>>     , x = c(1:10)
>>>>>     , g = c(1,1,2,2,3,3,4,4,5,5)
>>>>>     )
>>>>> # note that cbind on vectors creates a matrix
>>>>> # in a matrix all columns must be of the same type
>>>>> # but data frames generally have a variety of types
>>>>> # so don't use cbind when making a data frame
>>>>>
>>>>> library( dplyr )
>>>>>
>>>>> result <- dat %>% group_by( g ) %>% do( answer = d_rule( . ) ) %>%
>>>>as.data.frame
>>>>>
>>>>>
>>>>---------------------------------------------------------------------------
>>>>> Jeff Newmiller                        The     .....       .....  Go
>>>>Live...
>>>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.
>>Live
>>>>Go...
>>>>>                                       Live:   OO#.. Dead: OO#..
>>>>Playing
>>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.
>>with
>>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>>rocks...1k
>>>>>
>>>>---------------------------------------------------------------------------
>>>>> Sent from my phone. Please excuse my brevity.
>>>>>
>>>>> On March 4, 2015 2:02:06 PM PST, Typhenn Brichieri-Colombi via
>>R-help
>>>><r-help at r-project.org> wrote:
>>>>>>Hello,
>>>>>>
>>>>>>I am trying to use the following custom function in an
>>>>>>aggregatefunction, but cannot get R to recognize my data. I’ve read
>>>>the
>>>>>>help on function()and on aggregate() but am unable to solve my
>>>>problem.
>>>>>>How can I get R torecognize the data inputs for the custom function
>>>>>>nested within aggregate()?
>>>>>>
>>>>>>My custom function is found below, as well as the errormessage I
>>get
>>>>>>when I run it on a test data set (I will be using this functionon a
>>>>>>much larger dataset (over 600,000 rows))
>>>>>>
>>>>>>Thank you for your time and your help!
>>>>>>
>>>>>>
>>>>>>
>>>>>>d_rule<-function(a,x){
>>>>>>
>>>>>>i<-which(a==max(a))
>>>>>>
>>>>>>out<-ifelse(length(i)==1, x[i], min(x))
>>>>>>
>>>>>>return(out)
>>>>>>
>>>>>>}
>>>>>>
>>>>>>
>>>>>>
>>>>>>a<-c(2,2,1,4,2,5,2,3,4,4)
>>>>>>
>>>>>>x<-c(1:10)
>>>>>>
>>>>>>g<-c(1,1,2,2,3,3,4,4,5,5)
>>>>>>
>>>>>>dat<-as.data.frame(cbind(x,g))
>>>>>>
>>>>>>
>>>>>>
>>>>>>test<-aggregate(dat, by=list(g), FUN=d_rule,dat$a, dat$x)
>>>>>>
>>>>>>Error in dat$x : $ operator is invalid for atomic vectors
>>>>>>
>>>>>>
>>>>>>
>>>>>>       [[alternative HTML version deleted]]
>>>>>>
>>>>>>______________________________________________
>>>>>>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>PLEASE do read the posting guide
>>>>>>http://www.R-project.org/posting-guide.html
>>>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>



More information about the R-help mailing list