[R] Proportional response and boosting
Imelda.Somodi
imelda.somodi at gmail.com
Tue Jan 20 15:24:20 CET 2009
Dear Gerard,
Thank you very much for the quick answer. I am a bit uncertain, what you
meant by total frequency of type A. I have data on the extent of type A at a
location (let's call it freqA), this can be taken for fequency, yes and I
have information on the total extent of all the nautral vegetation types (A
to Z), lets call the latter freqAZ. Both freqA and freqAZ are vectors, with
individual values for each observation point.
So to make clear what I did so far I modelled just as you wrote:
pi=freqA/freqAZ
veg.glm = glm ( pi ~ x, weights = freqAZ, family=binomial),
Did you mean that too? Or what do you propose to use as weights and for
standardisation instead feqAZ (as you write "total freq of type A")? FreqA
over all the observations? That yould make a scalar to be a weight. I'm sure
you meant something else.
Thank you,
Imelda
Gerard M. Keogh wrote:
>
> Quick response on the binomial:
>
> If possible I would suggest you should model
>
> pi = (number/freq of type A) / (total_freq of type A)
>
> veg.glm = glm ( pi ~ x, weights = total_freq, family=binomial)
>
> The glm method is supposed to work only on the natural numbers (inc 0!)
> but
> also works for decimal data - it gives a warning in these cases which can
> be ignored.
>
> Hope this helps!
> Gerard
>
>
>
>
>
> "Imelda.Somodi"
> <imelda.somodi at gm
> ail.com> To
> Sent by: r-help at r-project.org
> r-help-bounces at r- cc
> project.org
> Subject
> [R] Proportional response and
> 20/01/2009 09:11 boosting
>
>
>
>
>
>
>
>
>
>
>
> Dear experts of boosting!
>
> I am planning to build vegetation models via boosting with either gbm or
> mboost. My problem is that my response variable is the proportion of a
> vegetation type in natural vegetation at a location.
>
> ResponseA = (area of vegetation type A/area of all natural vegetation
> types)
>
> That means that the response has a continuous distribution between 0 and 1
> with many 0s and 1s as well. As I understood from reading these forums, it
> is pretty close to a beta distribution with the exception that the
> marginal
> values (0,1) are also included. Because of the latter feature I cannot
> even
> build a beta regression, not that I could do a boosted variant of that.
> Nevertheless, I can think of my response as a binomial one with values
> between 0 and 1 and take 1 square meter (as if it was a pixel) of natural
> vegetation as an observation. This way I can do binomial glms for my data,
> so that I specify the no. of square meters of natural vegetation as
> weights
> (I round them to get integers to be applicable in glm). I hope I am
> allowed
> to post a side-question here. I always get a warning with these glms
> though.
> I give here a simple one-variable example:
>
> Call: tmp <- glm(ossz_ujstand2$k2_stand ~ BIO_1 + I((BIO_1)^2),
> family=binomial, na.action=na.omit,weights= ossz_ujstand2$weights),
>
> Where BIO_1 is a variable describing climate, and weights are the area of
> natural vegetation rounded to integers for each observation (a vector).
>
> Warning: "non-integer #successes in a binomial glm!"
>
> I read somewhere on this site that this can be normal, but would be
> reassured if it was stated that it is indeed so in my case as well.
>
> My problem with boosting is that I don’t know how to handle my response
> variable distribution. I am not quite sure how to treat the loss function
> either. It seems to me that it somehow corresponds to the link function as
> it needs to be defined by family() like link functions in glm. The
> potential
> choices for family also correspond. At the same time some papers about
> boosting imply to me that the loss function takes more the role of the
> curve
> estimation technique and that data with any distribution can be boosted
> with
> any type of loss functions.
> As a start I tried to do the same with boosting as I did with glms. Here
> is
> an example.
>
> With mboost:
> index<-!is.na(ossz_ujstand2$k2_stand) # I need this to remove
> NAs
> proba.bb2<-blackboost(k2_stand~BIO_1+BIO_12,data=ossz_ujstand2[index,],weights=ossz_ujstand2$weights[index],family=Binomial())
>
>
> Error in family at check_y(y) :
> response is not a factor but âfamily = Binomial()â
>
>
> With gbm using the modified code of Elith et al. 2008 Journal of Animal
> Ecology:
>
> index<-!is.na(ossz_ujstand2$k2_stand)
> k2.tc5.lr01<- gbm.step(data=ossz_ujstand2[index,],
> gbm.x = 50:147,
> gbm.y = 27,
> family = "bernoulli",
> tree.complexity = 5,
> learning.rate = 0.1,
> bag.fraction = 0.75,
> weights=ossz_ujstand2$weights)
>
> Error in gbm.fit(x, y, offset = offset, distribution = distribution, w =
> w,
>
> :
> Bernoulli requires the response to be in {0,1}
>
>
> So obviously the solution with weights does not work. Is there a
> straightforward way to model my response with the prefabricated families
> or
> I have to write a new loss function? I understand that it is possible in
> mboost, but I would greatly appreciate support on how to do this.
> Obviously,
> I am even uncertain about what type of link I should use for my data.
>
> Thank you very much!
>
> Imelda Somodi
> Assistant research fellow
> Institute of Ecology and Botany
> Hungarian Academy of Science
>
> --
> View this message in context:
> http://www.nabble.com/Proportional-response-and-boosting-tp21559467p21559467.html
>
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> **********************************************************************************
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from any
> computer. It is the policy of the Department of Justice, Equality and Law
> Reform and the Agencies and Offices using its IT services to disallow the
> sending of offensive material.
> Should you consider that the material contained in this message is
> offensive you should contact the sender immediately and also
> mailminder[at]justice.ie.
>
> Is le haghaidh an duine nó an eintitis ar a bhfuil sí dírithe, agus le
> haghaidh an duine nó an eintitis sin amháin, a bheartaítear an fhaisnéis a
> tarchuireadh agus féadfaidh sé go bhfuil ábhar faoi rún agus/nó faoi
> phribhléid inti. Toirmisctear aon athbhreithniú, atarchur nó leathadh a
> dhéanamh ar an bhfaisnéis seo, aon úsáid eile a bhaint aisti nó aon
> ghníomh a dhéanamh ar a hiontaoibh, ag daoine nó ag eintitis seachas an
> faighteoir beartaithe. Má fuair tú é seo trí dhearmad, téigh i dteagmháil
> leis an seoltóir, le do thoil, agus scrios an t-ábhar as aon ríomhaire. Is
> é beartas na Roinne Dlí agus Cirt, Comhionannais agus Athchóirithe Dlí,
> agus na nOifígí agus na nGníomhaireachtaí a úsáideann seirbhísí TF na
> Roinne, seoladh ábhair cholúil a dhícheadú.
> Más rud é go measann tú gur ábhar colúil atá san ábhar atá sa
> teachtaireacht seo is ceart duit dul i dteagmháil leis an seoltóir
> láithreach agus le mailminder[ag]justice.ie chomh maith.
> ***********************************************************************************
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context: http://www.nabble.com/Proportional-response-and-boosting-tp21559467p21564054.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list