[R] Unsuccessful beginner's struggle with lm
Duncan Murdoch
murdoch.duncan at gmail.com
Thu Aug 29 14:39:25 CEST 2013
On 13-08-29 8:23 AM, David Epstein wrote:
> I have two data frames, "train" and "response". Here is my attempt to do a
> linear regression. All entries of both data frames are numeric. I am
> expecting the intercept value to lie between 2 and 3 (in particular,
> non-zero).
lm expects the variables in the formula to be numeric vectors (or
factors). They are often columns of a dataframe, but they won't be
dataframes themselves.
>
> Here is a record of my interaction with R:
>
>> class(response)
> [1] "data.frame"
>> c(nrow(response),ncol(response))
> [1] 1389 1
>> class(train)
> [1] "data.frame"
>> c(nrow(train),ncol(train))
> [1] 1389 256
>> beta.lm <- lm(response ~ train)
> Error in model.frame.default(formula = response ~ train, drop.unused.levels
> = TRUE) :
> invalid type (list) for variable 'response'
>
> What elementary syntax error am I making in my call to lm? And why does R
> think at first that the class of "response" is data.frame, but that its
> class is "list" when I call lm?
dataframes are lists with some extra rules added. lm() is just
reporting the low level type, rather than the high level one.
The way to do what you want is to include the response as a column in
the same dataframe that includes the predictor variables. If you call
the dataframe "df" and the response column name "response", then the lm
call would look like
lm(response ~ ., data=df)
The "." here means "all the other columns". You could also list them
explicitly, but 256 of them sounds like a lot...
Duncan Murdoch
More information about the R-help
mailing list