[R] fitted.values less than observed values

Gavin Simpson gavin.simpson at ucl.ac.uk
Wed Aug 5 12:44:46 CEST 2009


On Tue, 2009-08-04 at 18:37 +0100, Federico Calboli wrote:
> On 4 Aug 2009, at 18:27, David Winsemius wrote:
> 
> > Your first posting made me think that you were complaining that the
> > fitted values were less than the raw values. Your second posting makes
> > me think that you may be conflating the English word "less" with  the
> > word English "fewer". Many native speakers make the same error, but in
> > this context it may be a critical problem for communicating what you
> > are seeing (or not seeing).
> >
> > Perhaps you could be more expansive about what you see and what you
> > expect with explicit attention to the numbers involved? Even better
> > would be small *reproducible* example.
> 
> Problem solved, I realised there are NAs in the data which I had  
> completely forgot about (serves me right for digging up old data to  
> add results to a paper). Without any irony or sarcasm, thanks for the  
> grammar correction, it might prove useful in the future.

You can fit your model with argument na.action = na.exclude to put back,
in the correct place, the missingness. E.g.

set.seed(123)
X <- rnorm(100)
Y <- 0.6 + (X * 0.5) + rnorm(100)
## simulate some missings in X
X[sample(length(X), 5)] <- NA
dat <- data.frame(X = X, Y = Y)
mod1 <- lm(Y ~ X, data = dat)
mod2 <- lm(Y ~ X, data = dat, na.action = na.exclude)
length(fitted(mod1))
length(fitted(mod2))
nrow(dat)
fitted(mod2)

> length(fitted(mod1))
[1] 95
> length(fitted(mod2))
[1] 100
> nrow(dat)
[1] 100

HTH

G

> Best,
> 
> Federico
> 
> 
> >
> > -- 
> > David
> >
> > On Aug 4, 2009, at 12:51 PM, Federico Calboli wrote:
> >
> >> Actually, I tried doing
> >>
> >> data2 = unique(data)
> >> mod = lm(y ~ x1 + ... + xn, data2)
> >> fitted(mod)
> >>
> >> and I still get les fitted values than observations.
> >>
> >> Federico
> >>
> >>
> >> On 4 Aug 2009, at 12:18, Federico Calboli wrote:
> >>
> >>> Hi All,
> >>>
> >>> I have some data where the dependent variable is a score, low (1:3)
> >>> or
> >>> high (8:9), and the independent variables are 21 genotypic markers.
> >>> I'm fitting a logistic regression on the whole dataset after
> >>> transforming the score to 0/1 and normal linear regression on the
> >>> high
> >>> and low subsets.
> >>>
> >>> I all cases I have a numer of cases of data 'duplications', i.e.
> >>> different individuals with the same score and the same genotype at
> >>> the
> >>> 21 markers.
> >>>
> >>> When I do:
> >>>
> >>> mod$fitted.values I get a number of fitted values corresponding to
> >>> the
> >>> umber of unique lines in the dataset. Is there a way to have the
> >>> fitted  values match the observation, even though some are  
> >>> duplicated
> >>> and so have the same fitted value? I could do it by hand but it's
> >>> laborious and I'd venture there is a better way.
> >>>
> >>> Best,
> >>>
> >>> Federico
> >>>
> >
> > David Winsemius, MD
> > Heritage Laboratories
> > West Hartford, CT
> >
> 
> --
> Federico C. F. Calboli
> Department of Epidemiology and Public Health
> Imperial College, St. Mary's Campus
> Norfolk Place, London W2 1PG
> 
> Tel +44 (0)20 75941602   Fax +44 (0)20 75943193
> 
> f.calboli [.a.t] imperial.ac.uk
> f.calboli [.a.t] gmail.com
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090805/2449f648/attachment-0002.bin>


More information about the R-help mailing list