[R] Linear Model with Discrete Data
David Winsemius
dwinsemius at comcast.net
Fri Jun 14 02:00:46 CEST 2013
On Jun 13, 2013, at 2:21 PM, Bert Gunter wrote:
> Lorenzo:
>
> 1. This is a statistics question, not an R question.
>
> 2. Your statistical background appears inadequate -- it looks like
> Poisson regression, which would fall under "generalized linear
> models". But it depends on how "discrete" discrete is (on some level,
> all measurements are discrete, discretized to the resolution of the
> measurement process).
There is an excellent R vignette on handling count data by authors: Achim Zeileis, Christian Kleiber, Simon Jackman. Easy to find with a Google search.
There's also a somewhat older but possibly useful resource a set of worked S/R examples to accompany Agresti's text on categorical data by Laura Thompson. Alsi easy to find on Google.
--
David.
>
> 3. So I would advise seeking local statistical help. Getting
> statistical advice remotely over the internet (even on a proper forum
> for statistical advice, which this is not) is fraught with hazard and
> the risk of bad science (not due to incompetence or maliciousness;
> just due to the possibilities of misunderstanding and confusion) --
> imho only, of course.
>
> Of course, feel free to reject this and proceed at your own risk.
>
> Cheers,
> Bert
>
>
>
> On Thu, Jun 13, 2013 at 1:49 PM, Lorenzo Isella
> <lorenzo.isella at gmail.com> wrote:
>> Dear All,
>> I am struggling with a linear model and an allegedly trivial data set.
>> The data set does not consist of categorical variables, but rather of
>> numerical discrete variables (essentially, they count the number of times
>> that something happened).
>> Can I still use a standard linear regression, i.e. something like lm(y~x)?
>> I attach a small snippet that illustrates the difficulties that I am
>> experiencing (I do not understand why R complains about a list()).
>> Any suggestion is appreciated.
>> The data file can be downloaded from
>>
>> http://db.tt/hEKv1wH2
>>
>> Cheers
>>
>> Lorenzo
>>
>>
>> #####################################
>>
>> data <- read.csv("testData.csv", header=TRUE)
>>
>>
>> data <- subset(data,select= -c (X100, X182))
>>
>>
>> y <- data$X358
>>
>> z <- subset(data, select=-c(X358))
>>
>> myLM <- lm(y~z)
>>
>>
>> #####################
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list