[R] Linear Model with Discrete Data

David Winsemius dwinsemius at comcast.net
Fri Jun 14 02:00:46 CEST 2013


On Jun 13, 2013, at 2:21 PM, Bert Gunter wrote:

> Lorenzo:
> 
> 1. This is a statistics question, not an R question.
> 
> 2. Your statistical background appears inadequate  -- it looks like
> Poisson regression, which would fall under "generalized linear
> models". But it depends on how "discrete" discrete is (on some level,
> all measurements are discrete, discretized to the resolution of the
> measurement process).

There is an excellent R vignette on handling count data by authors: Achim Zeileis, Christian Kleiber, Simon Jackman. Easy to find with a Google search.

There's also a somewhat older but possibly useful resource a set of worked S/R examples to accompany Agresti's text on categorical data by Laura Thompson. Alsi easy to find on Google.

-- 
David.
> 
> 3. So I would advise seeking local statistical help. Getting
> statistical advice remotely over the internet (even on a proper forum
> for statistical advice, which this is not) is fraught with hazard and
> the risk of bad science (not due to incompetence or maliciousness;
> just due to the possibilities of misunderstanding and confusion) --
> imho only, of course.
> 
> Of course, feel free to reject this and proceed at your own risk.
> 
> Cheers,
> Bert
> 
> 
> 
> On Thu, Jun 13, 2013 at 1:49 PM, Lorenzo Isella
> <lorenzo.isella at gmail.com> wrote:
>> Dear All,
>> I am struggling with a linear model and an allegedly trivial data set.
>> The data set does not consist of categorical variables, but rather of
>> numerical discrete variables (essentially, they count the number of times
>> that something happened).
>> Can I still use a standard linear regression, i.e. something like lm(y~x)?
>> I attach a small snippet that illustrates the difficulties that I am
>> experiencing (I do not understand why R complains about a list()).
>> Any suggestion is appreciated.
>> The data file can be downloaded from
>> 
>> http://db.tt/hEKv1wH2
>> 
>> Cheers
>> 
>> Lorenzo
>> 
>> 
>> #####################################
>> 
>> data <- read.csv("testData.csv", header=TRUE)
>> 
>> 
>> data <- subset(data,select= -c (X100, X182))
>> 
>> 
>> y <- data$X358
>> 
>> z <- subset(data, select=-c(X358))
>> 
>> myLM <- lm(y~z)
>> 
>> 
>> #####################
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list