[R] Why isn't R recognising integers as numbers?

(Ted Harding) Ted.Harding at manchester.ac.uk
Mon Sep 22 10:30:52 CEST 2008


Hi Ted (from Ted),
Just to clarify Marc's comments about dataframes in more basic terms.

If you read in data with read.csv() the result returned by the function
is a dataframe. This is a specialised kind of list, which you can think
of as a list of "columns" all of the same length. You can think of each
"column" as a vector of elements, all of which must be of the same type
within the column, though the type can vary (e.g. numeric, factor,
character) between columns. When you display a dataframe, it looks like
a matrix, though in R terms it is not really a matrix; it is a list,
where each component of the list is a "column".

Of course a dataframe, like any list, might have only one component.
But it is still a list -- and the actual contents are only available
"one layer down", after you have extracted that component by some
means (e.g. by using the "$" extractor). Simple example:

  L <- c(1,2,3,4)         ## vector
  L
# [1] 1 2 3 4
  L.df <- data.frame(L=L) ## Dataframe with 1 component named "L"
  L.df
#   L
# 1 1
# 2 2
# 3 3
# 4 4
  L.df$L                  ## Extract the component named "L"
# [1] 1 2 3 4             ## Compare with the result of 'L' above

# Try a regression on L (this works):
  lm(L ~ 1)
# Call:
# lm(formula = L ~ 1)
# Coefficients:
# (Intercept)  
#         2.5  

# Try a regression on L.df (this doesn't work):
  lm(L.df ~ 1)
# Error in model.frame.default(formula = L.df ~ 1,
#   drop.unused.levels = TRUE) : 
#   invalid type (list) for variable 'L.df'

# But it does after you refer to the component L by name:
  lm(L.df$L ~ 1)
# Call:
# lm(formula = L.df$L ~ 1)
# Coefficients:
# (Intercept)  
#         2.5  

# or:
  lm(L ~ 1, data=L.df)
# Call:
# lm(formula = L ~ 1, data = L.df)
# Coefficients:
# (Intercept)  
#         2.5  

# But you can (for a dataframe, not a general list) use an "index"
method of extraction *as if* it were a matrix (even though it isn't):

  L.df[,1]
# [1] 1 2 3 4
  L.df[3,1]
# [1] 3

# But compare with:
  L.df[1]
#   L
# 1 1
# 2 2
# 3 3
# 4 4

which is essentially the same as L.df itself (e.g. lm(L.df[1] ~ 1)
will not work in exactly the same way as lm(L.df ~ 1) didn't work).

The dataframe structure exists in R because so much data is typically
in the row by column (case by variables) layout such as you get in
spreadsheets and associated CSV files, and it is very useful to be
able to get into this layout directly (and refer to the variables
by name, as above).

The full generality of a 'list' can also be useful for encapsulating
data of a less strictly structured kind, but that is another (longer)
story!

Helping this helps.
Ted.


On 22-Sep-08 02:09:29, Ted Byers wrote:
> Thanks Marc,
> That was it. 
> 
> For the last 30 years, I'd write my own code, in FORTRAN, C++,
> or even Java, to do whatever statistical analysis I needed.
> When at the office, sometimes I could use SAS, but that hasn't
> been an option for me in years.
> 
> This is the first time I have had to load real data into R
> (instead of generating random data to use while playing with
> some of the stats functions, or manually typing dummy data).
> 
> I take it, then, that the result of loading data is a data
> frame, and notjust a matrix or array. Using something like
> "refdata18[, 1]" feels rather alien, but I'm sure I'll quickly
> get used to it.  I'd seen it before in the R docs, but it didn't
> register that I had to use it to get the functions of most
> interest to me to recognise my data as a vector of numbers,
> given I'd provided only a vector of integers as input.
> 
> Thanks
> 
> Ted
> 
> 
> Marc Schwartz wrote:
>> 
>> on 09/21/2008 08:01 PM Ted Byers wrote:
>>> I have a number of files containing anywhere from a few dozen to a
>>> few
>>> thousand integers, one per record.
>>> 
>>> The statement "refdata18 =
>>> read.csv("K:\\MerchantData\\RiskModel\\Capture.Week.18.csv", header =
>>> TRUE,na.strings="")" works fine, and if I type refdata18, I get the
>>> integers
>>> displayed, one value per record (along with a record number). 
>>> However,
>>> when
>>> I try " fitdistr(refdata18,"negative binomial")", or
>>> hist.scott(refdata18,
>>> prob = TRUE), I get an error:
>>> 
>>> Error in fitdistr(refdata18, "negative binomial") : 
>>>   'x' must be a non-empty numeric vector
>>> Or
>>> Error in hist.default(x, nclass.scott(x), prob = prob, xlab = xlab,
>>> ...)
>>> : 
>>>   'x' must be numeric
>>> 
>>> How can it not recognise integers as numbers?
>>> 
>>> Thanks
>>> 
>>> Ted
>> 
>> 'refdata18' is a data frame and the two functions are expecting a
>> numeric vector.
>> 
>> If you use:
>> 
>>   fitdistr(refdata18[, 1], "negative binomial")
>> 
>> or
>> 
>>   hist(refdata18[, 1])
>> 
>> you should get a suitable result, presuming that the first column in
>> the
>> data frame is a numeric vector.
>> 
>> Use:
>> 
>>   str(refdata18)
>> 
>> to get a sense for the structure of the data frame, including the
>> column
>> names, which you could then use, instead of the above index based
>> syntax.
>> 
>> HTH,
>> 
>> Marc Schwartz
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
> 
> -- 
> View this message in context:
> http://www.nabble.com/Why-isn%27t-R-recognising-integers-as-numbers--tp1
> 9600308p19600803.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 22-Sep-08                                       Time: 09:30:47
------------------------------ XFMail ------------------------------



More information about the R-help mailing list