[R] User defined function within a formula
William Dunlap
wdunlap at tibco.com
Fri Jul 17 01:51:00 CEST 2015
This might do what you want:
OPoly <- function(x, degree=1, weight=1, coefs=NULL, rangeX=NULL){
weight <- round(weight,0)# weight need to be integer
if(length(weight)!=length(x)) {
weight <- rep(1,length(x))
}
if (is.null(rangeX)) {
rangeX <- range(x)
}
p <- poly(4*(rep(x,weight)-mean(rangeX))/diff(rangeX), degree=degree,
coefs=coefs)
# why t(t(...))? That strips the attributes.
Z <- t( t(p[cumsum(weight),]) * sqrt(attr(p,"coefs")$norm2[-seq(2)]) )[,
degree, drop=FALSE]
class(Z) <- "OPoly"
attr(Z, "coefs") <- attr(p, "coefs")
attr(Z, "rangeX") <- rangeX
Z
}
makepredictcall.OPoly<-function(var,call)
{
if (is.call(call)) {
if (identical(call[[1]], quote(OPoly))) {
if (!is.null(tmp <- attr(var, "coefs"))) {
call$coefs <- tmp
}
if (!is.null(tmp <- attr(var, "rangeX"))) {
call$rangeX <- tmp
}
call$weight <- 1 # weight not relevant in predictions
}
}
call
}
d <- data.frame(Y=1:8, X=log(1:8), Weight=1:8)
fit <- lm(data=d, Y ~ OPoly(X, degree=2, weight=Weight))
predict(fit)[c(3,8)]
predict(fit, newdata=data.frame(X=d$X[c(3,8)])) # same result
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Thu, Jul 16, 2015 at 4:39 PM, William Dunlap <wdunlap at tibco.com> wrote:
> OPoly<-function(x,degree=1,weight=1){
> weight=round(weight,0)# weight need to be integer
> if(length(weight)!=length(x))weight=rep(1,length(x))
> p=poly(4*(rep(x,weight)-mean(range(x)))/diff(range(x)),degree)
> Z<-(t(t(p[cumsum(weight),])*sqrt(attr(p,"coefs")$norm2[-
> seq(2)]))[,degree])
> class(Z)<-"OPoly";Z
> }
>
> You need to make OPoly to have optional argument(s) that give
> the original-regressor-dependent information to OPoly and then
> have it return, as attributes, the value of those arguments.
> makepredictcall
> will take the attributes and attach them to the call in predvars so
> predict uses values derived from the original regressors, not value derived
> from the data to be predicted from.
>
> Take a look at a pair like makepredictcall.scale() and scale() for an
> example:
> scale has optional arguments 'center' and 'scale' that it returns as
> attributes
> and makepredictcall.scale adds those to the call to scale that it is given.
> Thus when you predict, the scale and center arguments come from the
> original data, not from the data you are predicting from.
>
>
>
>
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Thu, Jul 16, 2015 at 3:43 PM, Kunshan Yin <yinkunshan at gmail.com> wrote:
>
>> Thanks Bill for your quick reply.
>>
>> I tried your solution and it did work for the simple user defined
>> function xploly. But when I try with other function, it gave me error again:
>>
>> OPoly<-function(x,degree=1,weight=1){
>> weight=round(weight,0)# weight need to be integer
>> if(length(weight)!=length(x))weight=rep(1,length(x))
>> p=poly(4*(rep(x,weight)-mean(range(x)))/diff(range(x)),degree)
>>
>> Z<-(t(t(p[cumsum(weight),])*sqrt(attr(p,"coefs")$norm2[-seq(2)]))[,degree])
>> class(Z)<-"OPoly";Z
>> }
>>
>> ##this OPoly is an FORTRAN orthogonal polynomial routine, it first maps
>> the x to range[-2,2] then do QR, then return the results with sqrt(norm2).
>> Comparing with poly, this transformation will make the model coefficients
>> within a similar range as other variables, the R poly routine will usually
>> give you a very large coefficients. I did not find such routine in R, so I
>> have to define this as user defined function.
>> #######
>>
>> I also have following function as you suggested:
>>
>> makepredictcall.OPoly<-function(var,call)
>> {
>> if (is.call(call)) {
>> if (identical(call[[1]], quote(OPoly))) {
>> if (!is.null(tmp <- attr(var, "coefs"))) {
>> call$coefs <- tmp
>> }
>> }
>> }
>> call
>> }
>>
>>
>> But I still got error for following:
>>
>> > g3=glm(lot1 ~ log(u) + OPoly(u,1), data = clotting, family = Gamma)
>>
>> > predict(g3,dc)Error in poly(4 * (rep(x, weight) - mean(range(x)))/diff(range(x)), degree) :
>> missing values are not allowed in 'poly'
>>
>> I thought it might be due to the /diff(range(x) in the function. But
>> even I remove that part, it will still give me error. Any idea?
>>
>> Many thanks in advance.
>>
>> Alex
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Jul 16, 2015 at 2:09 PM, William Dunlap <wdunlap at tibco.com>
>> wrote:
>>
>>> Read about the 'makepredictcall' generic function. There is a method,
>>> makepredictcall.poly(), for poly() that attaches the polynomial
>>> coefficients
>>> used during the fitting procedure to the call to poly() that predict()
>>> makes.
>>> You ought to supply a similar method for your xpoly(), and xpoly() needs
>>> to return an object of a a new class that will cause that method to be used.
>>>
>>> E.g.,
>>>
>>> xpoly <- function(x,degree=1,...){ ret <- poly(x,degree=degree,...);
>>> class(ret) <- "xpoly" ; ret }
>>> makepredictcall.xpoly <- function (var, call)
>>> {
>>> if (is.call(call)) {
>>> if (identical(call[[1]], quote(xpoly))) {
>>> if (!is.null(tmp <- attr(var, "coefs"))) {
>>> call$coefs <- tmp
>>> }
>>> }
>>> }
>>> call
>>> }
>>>
>>> g2 <- glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
>>> predict(g2,dc)
>>> # 1 2 3 4
>>> 5
>>> #-0.01398928608 -0.01398928608 -0.01398928608 -0.01398928608
>>> #-0.01398928608
>>> # 6 7 8 9
>>> #-0.01398928608 -0.01398928608 -0.01398928608 -0.01398928608
>>>
>>> You can see the effects of makepredictcall() in the 'terms' component
>>> of glm's output. The 'variables' attribute of it gives the original
>>> function
>>> calls and the 'predvars' attribute gives the calls to be used for
>>> prediction:
>>> > attr(g2$terms, "variables")
>>> list(lot1, log(u), xpoly(u, 1))
>>> > attr(g2$terms, "predvars")
>>> list(lot1, log(u), xpoly(u, 1, coefs = list(alpha = 40, norm2 = c(1,
>>> 9, 8850))))
>>>
>>>
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>>
>>> On Thu, Jul 16, 2015 at 12:35 PM, Kunshan Yin <yinkunshan at gmail.com>
>>> wrote:
>>>
>>>> Hello, I have a question about the formula and the user defined
>>>> function:
>>>>
>>>> I can do following:
>>>> ###Case 1:
>>>> > clotting <- data.frame(
>>>> + u = c(5,10,15,20,30,40,60,80,100),
>>>> + lot1 = c(118,58,42,35,27,25,21,19,18),
>>>> + lot2 = c(69,35,26,21,18,16,13,12,12))
>>>> > g1=glm(lot1 ~ log(u) + poly(u,1), data = clotting, family = Gamma)
>>>> > dc=clotting
>>>> > dc$u=1
>>>> > predict(g1,dc)
>>>> 1 2 3 4 5
>>>> 6 7 8 9
>>>> -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929
>>>> -0.01398929 -0.01398929 -0.01398929
>>>>
>>>> However, if I just simply wrap the poly as a user defined function ( in
>>>> reality I would have my own more complex function) then I will get
>>>> error:
>>>> ###Case 2:
>>>> > xpoly<-function(x,degree=1){poly(x,degree)}
>>>> > g2=glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
>>>> > predict(g2,dc)
>>>> Error in poly(x, degree) :
>>>> 'degree' must be less than number of unique points
>>>>
>>>> It seems that the predict always treat the user defined function in the
>>>> formula with I(). My question is how can I get the results for Case2
>>>> same
>>>> as case1?
>>>>
>>>> Anyone can have any idea about this?
>>>>
>>>> Thank you very much.
>>>>
>>>> Alex
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list