[R] Working With Variables Having Different Lengths

Weidong Gu anopheles123 at gmail.com
Fri Oct 21 18:39:32 CEST 2011


Sounds like you are dealing with missing data problem. At default, lm
or glm would only keep observations with complete records (complete
case analysis). This can be problematic if you have many missing
variables and missing values occur not completely at random (i.e.,
missing values are dependent on other (un)measured variables or
missing values themselves). Imputation is a common tool for handling
imcomplete data analysis. In R, you can find packages which conduct
single or multiple imputations, e.g. randomForest, norm, mice, mi
etc..

No easy way out with missing data problems, all imputations are based
on some strong and untestable assumptions.


Weidong Gu


On Fri, Oct 21, 2011 at 12:13 PM, Rich Shepard <rshepard at appl-ecosys.com> wrote:
>  Because of regulatory requirement changes over several decades and weather
> conditions preventing site access the variables in my data set have
> different lengths. I'd like guidance on how to perform linear regressions
> and other models with these variables.
>
>  For example, there are 2206 rows for the parameter "TDS" but only 1191
> rows for the parameter "Cond." Such discrepancies are common in these data.
>
>  Is there a reference I can read to learn how to analyze such data?
>
> Rich
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list