[R] Using cforest on a hierarchically structured dataset

hmh hugomh at gmx.fr
Sat Nov 18 19:55:09 CET 2017


Hi,


I am facing a hierarchically structured dataset, and I am not sure of 
the right way to analyses it with cforest, if their is one.

- - BACKGROUND & PROBLEM

We are analyzing the behavior of some social birds facing different 
temperature conditions.


The behaviors of the birds were recorder during many sessions of 2 hours.


Conditional RF (cforest) are quite useful for this analysis since, we 
have a large number of variables describing the temperature during the 2 
hours, they are rather highly correlated, and we expect they have some 
non linear effects on the behavior.


For the other behaviors, for each individual and each session of 2 
hours, we recorded the frequency.

For each session of 2 hours, we have only one value for the variables 
related to the temperature, since these variables are for example

minimal and maximal temperature, median temperature, and different 
measures of the variance of the temperature.


Visually the dataset thus looks like this:

Y_behaviour_frequency   Individual   Session   X1   X2 X3   ...
                   0.5         ind1        S1    5 10    7   ...
                  0.55         ind2        S1    5 10    7   ...
                   0.2         ind3        S1    5 10    7   ...
...                                       S1    5 10    7   ...
                   0.3         ind1        S2   15 7   50   ...
                  0.01         ind5        S2   15 7   50   ...
...                                       S2   15 7   50   ...
                   0.4         ind1        S3    2 8    5   ...
                  0.05         ind3        S3    2 8    5   ...
                   0.1         ind4        S3    2 8    5   ...
                   0.2         ind5        S3    2 8    5   ...
...                                       S3    2 8    5   ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...   ...

If I run a classical cforest on this dataset, explaining 
Y_behaviour_frequency with Individual, Session and all the X... 
variables, I end up with some conditional relative importances similar 
to the attached plot:

They are all very very low, but none is negative. The absence of 
negative conditional relative importance is annoying since we were 
selecting variables using the threshold of minus two times the lowest 
conditional relative importance.


- - QUESTIONS

1) have you ever faced one of these situations of

     - all very low conditional relative importances

     - all positives conditional relative importances

     - hierarchically structured dataset analyzed with cforest

?


I think, but I am not sure, the very low but all positive conditional 
importance might come from the hierarchically structured dataset:

Since RF are based on bootstraps, when bootstrapping in at each 
iteration, all sessions or almost all sessions of 2 hours are sampled, 
although they are the main source of variation.

The bootstrap would need to be itself hierarchic, first sampling the 
sessions and then sampling the individual in the sampled session of 2 hours.


2) It's easy to perform such kind of hierarchic bootstrap in R, but have 
you ever heard about it in a random forest ?

The question was asked 4 years ago:

here: 
https://stats.stackexchange.com/questions/62840/random-forest-and-cluster-level-bootstrapping

and here: 
https://stats.stackexchange.com/questions/93156/random-forest-on-multi-level-hierarchical-structured-data

but the main track "hie-ran-forest" also called "HieRanFor" seems 
aborted. (https://r-forge.r-project.org/R/?group_id=2021)



Thanks for your help,

cheers.


hugo


-- 
- no title specified

Hugo Mathé-Hubert

BU-G19

postdoc

eawag (Swiss Federal Institute of Aquatic Science and Technology)
Evolutionary Ecology 
<http://www.eawag.ch/en/department/eco/main-focus/evolutionary-ecology/>- 
About me 
<http://www.eawag.ch/en/aboutus/portrait/organisation/staff/profile/hugo-mathe-hubert/>

Überlandstrasse 133
P.O.Box 611
8600 Dübendorf, Switzerland

- - - - - - - - - - - - - - - - - -

Thoughts appear from doubts and die in convictions. Therefore, doubts 
are an indication of strength and convictions an indication of weakness. 
Yet, most people believe the opposite.

- - - - - - - - - - - - - - - - - -

Les réflexions naissent dans les doutes et meurent dans les certitudes. 
Les doutes sont donc un signe de force et les certitudes un signe de 
faiblesse. La plupart des gens sont pourtant certains du contraire.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rplot.pdf
Type: application/pdf
Size: 5040 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20171118/9b88135c/attachment.pdf>


More information about the R-help mailing list