[R] Novice question about getting data into R

Fri Sep 19 19:14:18 CEST 2008

Ted Byers wrote:
> I found it easy to use R when typing data manually into it.  Now I need to
> read data from a file, and I get the following errors:
>
>   
>> refdata =
>> read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv", header
>> = TRUE)
>>     
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
> : 
>   line 1 did not have 42 elements
>   
>> refdata =
>> read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv")
>>     
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
> : 
>   line 2 did not have 42 elements
>   
>
> (I'd tried the first version above because the first record has column
> names.)
>
> First, I don't know why R expects 42 elements in a record.  
>   
Hard to tell. One guess is that you have 42 header names. Spaces inside
any of them? Is this really a CSV file? (As in Comma Separated Values).
If so, you at least need to set the sep= argument, but how about
read.csv()? or if TAB separated, read.delim().
> There is one column for a time variable (weeks since a given week of samples
> were taken) and one for each week of sampling in the data file (Week 18
> through Week 37 inclusive).  And there is only 19 rows.
> The samples represented by the columns are independant, and the numbers in
> the columns are the fraction of events sampled that result in an event of
> another kind in the week since the sample was taken.
>
> The samples are not the same size, and starting with week 20, the number of
> values progressively gets smaller since there have been fewer than 37  weeks
> since the samples were taken.
>
> I can show you the contents of the data file if you wish.  It is
> unremarkable, csv, with strings used for column names enclosed in double
> quotes.
>   
You might well have to. One man's "unremarkable" can be remarkably
different from others'...
> I don't have to manually separate the samples into their own files do I?  I
> was hoping to write a function that estimates the density function that best
> fits each sample individually, and then iterate of the columns, applying
> that function to each in turn.
>
> What is the best way to handle this?
>
> Thanks
>
> Ted
>
>
>   


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907