[R] Multivariable model

Luciano La Sala lucianolasala at yahoo.com.ar
Tue May 24 15:57:09 CEST 2011


Hello r-list members, 

I've been doing some linear modeling with a dataset structured as follows.
Tubes containing 500 larvae of Trichinella each were treated with one of
four different temperatures. Each day (or every 10 days depending on
treatment group), 3 tubes were selected from each treatment and all the dead
larvae were counted. The tubes were discarded. Final larvae counts were
averaged.      

Then we have:

y = dead larvae (count)


X1 = Day: dead larvae contained on three tubes were quantified either daily
(tubes at -30) or every 10 days (tubes at -20 ºC, 4º C, and lab temp.). This
was done for each treatment until all the larvae in all three tubes were
dead. 

X2: Temperature treatment (4 factors): -30º C, -20 ºC, 4º C, and lab
temperature.    

Because we counted larvae for each treatment until all the 500 "larvae of
the day (batch of three tubes) were dead, the experiment was terminated at
different times for each treatment (e.g. day 95 for -30, day 200 for -20,
and so on). This led to a final dataset containing data collected over
different time ranges.    
  
Days    Dead_larvae   Group
1       100           30 below
2       145           30 below
3       277           30 below
4       284           30 below
5       288           30 below
6       294           30 below
7       359           30 below
.       .             .      
.       .             .      
.       .             .      
95      500           30 below      

10      25            20 below
20      35            20 below
30      105           20 below
40      230           20 below
.       .             .  
.       .             .
.       .             .
200     500           20 below
.       .             .   
.       .             .
.       .             . 

Model specification: 

> my_model <- lm(Larvae_count ~ Days + I(Days^2) + Group, data = Data)

Call:
lm(formula = Larvas_muertas ~ Dias + I(Dias^2) + Grupo, data = Data)

Residuals:
     Min       1Q   Median       3Q      Max 
-356.983  -31.229    3.606   37.768  170.846 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.556e+02  6.634e+00   53.60   <2e-16 ***
Dias         2.403e+00  9.963e-02   24.11   <2e-16 ***
I(Dias^2)   -2.732e-03  1.885e-04  -14.49   <2e-16 ***
Grupo-20    -1.422e+02  1.283e+01  -11.08   <2e-16 ***
Grupo4      -3.117e+02  1.188e+01  -26.23   <2e-16 ***
GrupoAmb    -3.830e+02  1.212e+01  -31.59   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 67.04 on 348 degrees of freedom
  (8 observations deleted due to missingness)
Multiple R-squared: 0.7879,     Adjusted R-squared: 0.7848 
F-statistic: 258.5 on 5 and 348 DF,  p-value: < 2.2e-16


Q1. Is the modeling approach / specification correct? 

Q2. Is the fact that larvae were counted over different periods of time,
thus leading to markedly different ranges of X1 for each treatment, too bad
a thing? Might this lead to seriously biased estimates? 

Q3. Am I incurring in violation of residual independence due to correlation
between residuals from different time points? If so, how can one deal with
it in R?    

I know my question is both a statistical and R-related one, so apologies in
advance. 
Best luck,  

Luciano 



More information about the R-help mailing list