[R] which multivariate regression?

Denis Kazakiewicz d.kazakiewicz at gmail.com
Tue Feb 8 23:42:43 CET 2011


Dear Andrew Halford
Here just merely a suggestion of R newbie without sufficient statistical
background


MCMCpoisson from package MCMCpack


Could you please lately post the answer to your very interesting
question when you will find it.
Wit best regards 
Denis 










У Аўт, 08/02/2011 у 14:38 +1000, Andrew Halford піша:
> Hi R-Users,
> 
> I have a student doing work with lionfish and she has been trying to analyse
> a multivariate dataset to see what variables/factors are influencing the
> behaviour of lionfish. We have attempted a number of analyses, including
> rpart, relimpo and standard linear regression but we are not having much
> luck with quality output. The data is very non-normal and we would
> appreciate some advice on the best way to go about analysing it.
> 
> Kathy has provided a synopsis below along with part of the dataset below.
> 
> Any help/advice appreciated.
> 
>   I am stuck in a problem with a dataset on a behavior study on Indo-Pacific
> lionfish *Pterois volitans*. The idea is to find out whether lionfish behave
> differently at different locations and times of day and whether these
> differences can be accounted for by any of the explanatory variables
> measured.
>   My response variable is a series of behavior categories: (1) rest, (2)
> passive hunting and (3) active hunting. I have chosen to treat them
> individually because each one has a different biological importance, so
> basically I am trying to come up with an answer for 3 response variables.
> Measurement for these behavior categories is proportion of time (10 minute
> observation) spent at the activity described and values range from 0 to 1.
> Explanatory variables are a mix of categorical and continuous variables and
> are six: Region (Guam and Philippines), Hours after Sunrise, Habitat (5
> categories), Weather (3 categories), Current (3 categories) and Lionfish
> Size (cm).
> 
>   The following is an example of the dataset for response variable Rest (R)
> 
>   R
> 
> REG
> 
> HAS
> 
> HAB
> 
> WE
> 
> CU
> 
> SI
> 
> 0.05
> 
> 0
> 
> 11.0166667
> 
> Artificial
> 
> 2
> 
> 0
> 
> 10
> 
> 0.05
> 
> 0
> 
> 0.56666667
> 
> Rock_boulder_cave
> 
> 1
> 
> 1
> 
> 11
> 
> 0.05
> 
> 0
> 
> 9.13333333
> 
> Artificial
> 
> 1
> 
> 1
> 
> 18
> 
> 0.1
> 
> 0
> 
> 4.2
> 
> Sand_rubble
> 
> 1
> 
> 2
> 
> 20
> 
> 0.1
> 
> 0
> 
> 9.13333333
> 
> Rock_boulder_cave
> 
> 1
> 
> 2
> 
> 10
> 
> 0.1
> 
> 0
> 
> 9.6
> 
> Sand_rubble
> 
> 0
> 
> 0
> 
> 7
> 
> 0.1
> 
> 0
> 
> 0.78333333
> 
> Rock_boulder_cave
> 
> 1
> 
> 0
> 
> 31
> 
> 0.1
> 
> 0
> 
> 1.28333333
> 
> Artificial
> 
> 1
> 
> 0
> 
> 20
> 
> 0.1
> 
> 0
> 
> 10.8666667
> 
> Coral
> 
> 1
> 
> 0
> 
> 22
> 
> 0.15
> 
> 0
> 
> 10.4166667
> 
> Coral
> 
> 0
> 
> 1
> 
> 27
> 
> 0.2
> 
> 0
> 
> 3.46666667
> 
> Rock_boulder_cave
> 
> 0
> 
> 0
> 
> 8
> 
> 0.2
> 
> 0
> 
> 1.23333333
> 
> Rock_boulder_cave
> 
> 1
> 
> 0
> 
> 25
> 
> 0.45
> 
> 1
> 
> 11.6833333
> 
> Coral
> 
> 2
> 
> 0
> 
> 15
> 
> 0.5
> 
> 1
> 
> 11.0166667
> 
> Artificial
> 
> 1
> 
> 2
> 
> 14
> 
> 0.5
> 
> 1
> 
> 11.9166667
> 
> Artificial
> 
> 0
> 
> 0
> 
> 14
> 
> 0.5
> 
> 1
> 
> 9.53333333
> 
> Artificial
> 
> 1
> 
> 0
> 
> 24
> 
> 0.5
> 
> 1
> 
> 9.83333333
> 
> Artificial
> 
> 1
> 
> 0
> 
> 15
> 
> 0.5
> 
> 1
> 
> 11.5833333
> 
> Rock_boulder_cave
> 
> 1
> 
> 1
> 
> 29
> 
> 0.53
> 
> 1
> 
> 5.91666667
> 
> Coral
> 
> 1
> 
> 1
> 
> 15
> 
> 0.6
> 
> 1
> 
> 11.0166667
> 
> Artificial
> 
> 1
> 
> 2
> 
> 17
> 
> 0.6
> 
> 1
> 
> 9.78333333
> 
> Rock_boulder_cave
> 
> 0
> 
> 0
> 
> 12
> 
> 0.6
> 
> 1
> 
> 4.68333333
> 
> Sand_rubble
> 
> 2
> 
> 0
> 
> 14
> 
> 0.6
> 
> 1
> 
> 5.01666667
> 
> Rock_boulder_cave
> 
> 2
> 
> 0
> 
> 16
> 
> 0.6
> 
> 1
> 
> 3.18333333
> 
> Artificial
> 
> 2
> 
> 1
> 
> 19
> 
> 0.65
> 
> 1
> 
> 5.25
> 
> Coral
> 
> 2
> 
> 0
> 
> 15
> 
> 0.65
> 
> 1
> 
> 9.63333333
> 
> Sand_rubble
> 
> 1
> 
> 1
> 
> 17
> 
> 
> 
> 
>    As you can see here I have converted categorical variables region,
> current and weather to numerical; region because it can be expressed in
> binary form and the other two because they represent a quantity. For habitat
> I have created a dummy variable based on deviation coding, and introduced it
> as a variable in my model.
>    Total sample size is 357, of which each sample is an observation at a
> particular time of day. A histogram of my response variable is not normally
> distributed and has a bit of a U-shape with lots of 0s and 1s, which means
> the animal was either completely engaged in that activity during the 10 min.
> observation or didn't show it at all. I have tried a series of
> transformations to normalize but have been unsuccessful (log, log(x+1), ln,
> sqrt, fourth root).
>     What type of analyses have I tried?
> (1) Regression trees.
>      Using categorical variables as categorical without changing into
> numerical. This was coded with package rpart and is the preferred analyses
> due to ease of interpretation. The response variable was untransformed and
> the distribution chosen Poisson. Result was a tree with immediately
> increasing error (cp) which picked 0 splits as the best tree.
> 
> (2) Multiple regression
>     Tried using package relaimpo to obtain a classification on the
> importance of explanatory variables. Used different transformations to
> analyze residuals and in all cases obtained a weird looking set of residuals
> with a portion normally distributed and another portion clustered to the
> side, giving the whole graph a clear trend (my guess is these are all the 1s
> and 0s in the data).
>     I also tried non-linear regressions (glm) with package pscl (Poisson,
> negative binomial and zero inflated negative binomial. In all cases fit
> seemed adequate but variance explained was very small and coefficients
> estimated for my EVs very low.
> 
>    Any ideas??? I have lastly used Primer to analyze the response variable
> in response to each EV individually. That works well but limits my
> conclusions and doesn't allow me to account for variation in one of the EVs
> affecting others. I appreciate any help I can get,
>



More information about the R-help mailing list