[R] which multivariate regression?
Denis Kazakiewicz
d.kazakiewicz at gmail.com
Tue Feb 8 23:42:43 CET 2011
Dear Andrew Halford
Here just merely a suggestion of R newbie without sufficient statistical
background
MCMCpoisson from package MCMCpack
Could you please lately post the answer to your very interesting
question when you will find it.
Wit best regards
Denis
У Аўт, 08/02/2011 у 14:38 +1000, Andrew Halford піша:
> Hi R-Users,
>
> I have a student doing work with lionfish and she has been trying to analyse
> a multivariate dataset to see what variables/factors are influencing the
> behaviour of lionfish. We have attempted a number of analyses, including
> rpart, relimpo and standard linear regression but we are not having much
> luck with quality output. The data is very non-normal and we would
> appreciate some advice on the best way to go about analysing it.
>
> Kathy has provided a synopsis below along with part of the dataset below.
>
> Any help/advice appreciated.
>
> I am stuck in a problem with a dataset on a behavior study on Indo-Pacific
> lionfish *Pterois volitans*. The idea is to find out whether lionfish behave
> differently at different locations and times of day and whether these
> differences can be accounted for by any of the explanatory variables
> measured.
> My response variable is a series of behavior categories: (1) rest, (2)
> passive hunting and (3) active hunting. I have chosen to treat them
> individually because each one has a different biological importance, so
> basically I am trying to come up with an answer for 3 response variables.
> Measurement for these behavior categories is proportion of time (10 minute
> observation) spent at the activity described and values range from 0 to 1.
> Explanatory variables are a mix of categorical and continuous variables and
> are six: Region (Guam and Philippines), Hours after Sunrise, Habitat (5
> categories), Weather (3 categories), Current (3 categories) and Lionfish
> Size (cm).
>
> The following is an example of the dataset for response variable Rest (R)
>
> R
>
> REG
>
> HAS
>
> HAB
>
> WE
>
> CU
>
> SI
>
> 0.05
>
> 0
>
> 11.0166667
>
> Artificial
>
> 2
>
> 0
>
> 10
>
> 0.05
>
> 0
>
> 0.56666667
>
> Rock_boulder_cave
>
> 1
>
> 1
>
> 11
>
> 0.05
>
> 0
>
> 9.13333333
>
> Artificial
>
> 1
>
> 1
>
> 18
>
> 0.1
>
> 0
>
> 4.2
>
> Sand_rubble
>
> 1
>
> 2
>
> 20
>
> 0.1
>
> 0
>
> 9.13333333
>
> Rock_boulder_cave
>
> 1
>
> 2
>
> 10
>
> 0.1
>
> 0
>
> 9.6
>
> Sand_rubble
>
> 0
>
> 0
>
> 7
>
> 0.1
>
> 0
>
> 0.78333333
>
> Rock_boulder_cave
>
> 1
>
> 0
>
> 31
>
> 0.1
>
> 0
>
> 1.28333333
>
> Artificial
>
> 1
>
> 0
>
> 20
>
> 0.1
>
> 0
>
> 10.8666667
>
> Coral
>
> 1
>
> 0
>
> 22
>
> 0.15
>
> 0
>
> 10.4166667
>
> Coral
>
> 0
>
> 1
>
> 27
>
> 0.2
>
> 0
>
> 3.46666667
>
> Rock_boulder_cave
>
> 0
>
> 0
>
> 8
>
> 0.2
>
> 0
>
> 1.23333333
>
> Rock_boulder_cave
>
> 1
>
> 0
>
> 25
>
> 0.45
>
> 1
>
> 11.6833333
>
> Coral
>
> 2
>
> 0
>
> 15
>
> 0.5
>
> 1
>
> 11.0166667
>
> Artificial
>
> 1
>
> 2
>
> 14
>
> 0.5
>
> 1
>
> 11.9166667
>
> Artificial
>
> 0
>
> 0
>
> 14
>
> 0.5
>
> 1
>
> 9.53333333
>
> Artificial
>
> 1
>
> 0
>
> 24
>
> 0.5
>
> 1
>
> 9.83333333
>
> Artificial
>
> 1
>
> 0
>
> 15
>
> 0.5
>
> 1
>
> 11.5833333
>
> Rock_boulder_cave
>
> 1
>
> 1
>
> 29
>
> 0.53
>
> 1
>
> 5.91666667
>
> Coral
>
> 1
>
> 1
>
> 15
>
> 0.6
>
> 1
>
> 11.0166667
>
> Artificial
>
> 1
>
> 2
>
> 17
>
> 0.6
>
> 1
>
> 9.78333333
>
> Rock_boulder_cave
>
> 0
>
> 0
>
> 12
>
> 0.6
>
> 1
>
> 4.68333333
>
> Sand_rubble
>
> 2
>
> 0
>
> 14
>
> 0.6
>
> 1
>
> 5.01666667
>
> Rock_boulder_cave
>
> 2
>
> 0
>
> 16
>
> 0.6
>
> 1
>
> 3.18333333
>
> Artificial
>
> 2
>
> 1
>
> 19
>
> 0.65
>
> 1
>
> 5.25
>
> Coral
>
> 2
>
> 0
>
> 15
>
> 0.65
>
> 1
>
> 9.63333333
>
> Sand_rubble
>
> 1
>
> 1
>
> 17
>
>
>
>
> As you can see here I have converted categorical variables region,
> current and weather to numerical; region because it can be expressed in
> binary form and the other two because they represent a quantity. For habitat
> I have created a dummy variable based on deviation coding, and introduced it
> as a variable in my model.
> Total sample size is 357, of which each sample is an observation at a
> particular time of day. A histogram of my response variable is not normally
> distributed and has a bit of a U-shape with lots of 0s and 1s, which means
> the animal was either completely engaged in that activity during the 10 min.
> observation or didn't show it at all. I have tried a series of
> transformations to normalize but have been unsuccessful (log, log(x+1), ln,
> sqrt, fourth root).
> What type of analyses have I tried?
> (1) Regression trees.
> Using categorical variables as categorical without changing into
> numerical. This was coded with package rpart and is the preferred analyses
> due to ease of interpretation. The response variable was untransformed and
> the distribution chosen Poisson. Result was a tree with immediately
> increasing error (cp) which picked 0 splits as the best tree.
>
> (2) Multiple regression
> Tried using package relaimpo to obtain a classification on the
> importance of explanatory variables. Used different transformations to
> analyze residuals and in all cases obtained a weird looking set of residuals
> with a portion normally distributed and another portion clustered to the
> side, giving the whole graph a clear trend (my guess is these are all the 1s
> and 0s in the data).
> I also tried non-linear regressions (glm) with package pscl (Poisson,
> negative binomial and zero inflated negative binomial. In all cases fit
> seemed adequate but variance explained was very small and coefficients
> estimated for my EVs very low.
>
> Any ideas??? I have lastly used Primer to analyze the response variable
> in response to each EV individually. That works well but limits my
> conclusions and doesn't allow me to account for variation in one of the EVs
> affecting others. I appreciate any help I can get,
>
More information about the R-help
mailing list