[R] Unsuccessful beginner's struggle with lm

Duncan Murdoch murdoch.duncan at gmail.com
Thu Aug 29 14:39:25 CEST 2013


On 13-08-29 8:23 AM, David Epstein wrote:
> I have two data frames, "train" and "response". Here is my attempt to do a
> linear regression. All entries of both data frames are numeric. I am
> expecting the intercept value to lie between 2 and 3 (in particular,
> non-zero).

lm expects the variables in the formula to be numeric vectors (or 
factors).  They are often columns of a dataframe, but they won't be 
dataframes themselves.

>
> Here is a record of my interaction with R:
>
>> class(response)
> [1] "data.frame"
>> c(nrow(response),ncol(response))
> [1] 1389    1
>> class(train)
> [1] "data.frame"
>> c(nrow(train),ncol(train))
> [1] 1389  256
>> beta.lm <- lm(response ~ train)
> Error in model.frame.default(formula = response ~ train, drop.unused.levels
> = TRUE) :
>    invalid type (list) for variable 'response'
>
> What elementary syntax error am I making in my call to lm? And why does R
> think at first that the class of "response" is data.frame, but that its
> class is "list" when I call lm?

dataframes are lists with some extra rules added.  lm() is just 
reporting the low level type, rather than the high level one.

The way to do what you want is to include the response as a column in 
the same dataframe that includes the predictor variables.  If you call 
the dataframe "df" and the response column name "response", then the lm 
call would look like

lm(response ~ ., data=df)

The "." here means "all the other columns".  You could also list them 
explicitly, but 256 of them sounds like a lot...

Duncan Murdoch



More information about the R-help mailing list