[R-sig-ME] thoughts on variable importance

Farrar, David Farrar.David at epa.gov
Mon Mar 26 21:31:21 CEST 2018


Design is too complicated for purposes here (BTW I didn’t design the study).   For simplicity let’s say you obtain samples of “material” from several “sources,” and you measure “something” on the these samples at two facilities.   Each “source” is measured multiple times at a single facility.  The sources can differ in some identifiable aspects of their procedures for generating material.   There is a continuous variable which varies in an uncontrolled manner across measurements from a given source, modelled as a fixed  effects regressor.   It cannot assumed that one can account for all variation among the sources, therefore there is a random factor for “source,”   in addition to fixed effects for identified difference in procedure.

In the event, “material” is treated sewage.  The regressor is wind speed in a field study of aerosolizing of bacterial endotoxins.

This is kind of messy for those variables but the study is actually pretty good for another purpose, namely comparing measurement devices.  These are paired in particular for wind speed.   However, we tried to characterize the importance of other variables, in relative terms.  It appeared to me that under the conditions of the study wind speed was more important than variation among sources.

R. F. Herrmann, R. J. Grosser, D. Farrar, R. B. Brobst.   2017.   Field studies measuring the aerosolization of endotoxin during the land application of Class B biosolids.
Aerobiologia (2017) 33:417–434.

From: Thierry Onkelinx [mailto:thierry.onkelinx at inbo.be]
Sent: Monday, March 26, 2018 9:42 AM
To: Farrar, David <Farrar.David at epa.gov>
Cc: Ben Bolker <bbolker at gmail.com>; r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] thoughts on variable importance

Dear David,

You are still very vague on the design. Which makes it much harder to answer your question. Can you please give us a more focused question? And provide some dummy data and a model?

Best regards,


ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be<mailto:thierry.onkelinx at inbo.be>
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be<http://www.inbo.be>
///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

[https://inbo-website-prd-532750756126.s3-eu-west-1.amazonaws.com/inbologoleeuw_nl.png]<https://www.inbo.be>

2018-03-26 15:07 GMT+02:00 Farrar, David <Farrar.David at epa.gov<mailto:Farrar.David at epa.gov>>:

Sorry yes.   I think my coffee had not taken effect.  The broad issue is comparison of variable importance when some variables have been modeled as fixed and others as random.   In my case the variables of most interest were modeled as fixed and some nuisance variables (as I saw them) were modeled as random.   I thought this was crude but possibly good enough for the situation, but I wondered if there was an interest in discussing this, or there are some more refined methods that I might have considered.  The actual analysis is already done and published some time ago.   I didn't want to go into more detail because I did not want to focus on the modeling.   I intended to be a little vague in order to cast a wide net.

-----Original Message-----
From: Ben Bolker [mailto:bbolker at gmail.com<mailto:bbolker at gmail.com>]
Sent: Monday, March 26, 2018 8:54 AM
To: Farrar, David <Farrar.David at epa.gov<mailto:Farrar.David at epa.gov>>
Cc: r-sig-mixed-models at r-project.org<mailto:r-sig-mixed-models at r-project.org>
Subject: Re: [R-sig-ME] thoughts on variable importance

Methodology questions are fine, but can you spell out your question a bit more?

On Mon, Mar 26, 2018 at 8:34 AM, Farrar, David <Farrar.David at epa.gov<mailto:Farrar.David at epa.gov>> wrote:
> There is probably some tendency for studies to be planned so that the variables thought to be most important can be evaluated as fixed effects, as I did.
> For analysis of a small field environmental field study, I eyeballed the BLUPs for a few nuisance variables, and noted that they did not suggest effects as large as for those variables that interested us most.   Thoughts?
> My apology if a methodology question is not favored here.   I did not think it was a question about introductory mixed models.
> Regards,
> David
>
> David Farrar, Ph.D., Biostatistician
> USEPA ORD NCEA BRAB Cincinnati, OH
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models


	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list