[R] Problems with subset, droplevels and lm: variable lengths differ
Michael Friendly
friendly at yorku.ca
Mon Apr 16 19:43:43 CEST 2012
[Env: R 2.14.2 / Win Xp]
In the script below, I want to select some variables from
rrcov::OsloTransect, delete cases with
any missing data, and subset the data frame Oslo to remove cases for two
levels of the
factor litho that occur with low frequency.
The checks I run on my new data frame Oslo look OK, but I when I try to
fit a multivariate
linear model with lm(), I am getting an error: variable lengths differ
(found for 'litho').
How can I fix this?
> data(OsloTransect, package="rrcov")
> # keep a subset of variables & rename some variables
> Oslo <-OsloTransect[c("X.ID", "XCOO", "YCOO", "X.FOREST",
"X.WEATHER", "X.FLITHO", "ALT")]
> colnames(Oslo) <- c("site", "XC", "YC", "forest", "weather", "litho",
"altitude")
> Oslo <- cbind(Oslo, OsloTransect[,c("Cu", "Fe", "K", "Mg", "Mn", "P",
"Zn")])
> # make site a factor
> Oslo[,"site"] <- factor(Oslo[,"site"])
>
> # log transform the chemical elements
> Oslo[,8:14] <- log(Oslo[,8:14])
>
> # delete cases with missing data
> Oslo <- Oslo[complete.cases(Oslo),]
> nrow(Oslo)
[1] 350
>
> # delete low frequency litho=="GNEID_O" | "MICSH"
> Oslo <- subset(Oslo, !litho %in% c("GNEID_O", "MICSH"), drop=TRUE)
> nrow(Oslo)
[1] 332
> Oslo<- droplevels(Oslo)
> table(Oslo$litho)
CAMSED GNEIS_O GNEIS_R MAGM
98 89 32 113
> nrow(Oslo)
[1] 332
> mod1 <- lm(cbind("Cu", "Fe", "K", "Mg", "Mn", "P", "Zn") ~ litho +
forest + weather, data=Oslo)
Error in model.frame.default(formula = cbind("Cu", "Fe", "K", "Mg",
"Mn", :
variable lengths differ (found for 'litho')
>
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street Web: http://www.datavis.ca
Toronto, ONT M3J 1P3 CANADA
More information about the R-help
mailing list