[R] Novice question about getting data into R
Peter Dalgaard
P.Dalgaard at biostat.ku.dk
Fri Sep 19 19:14:18 CEST 2008
Ted Byers wrote:
> I found it easy to use R when typing data manually into it. Now I need to
> read data from a file, and I get the following errors:
>
>
>> refdata =
>> read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv", header
>> = TRUE)
>>
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
> :
> line 1 did not have 42 elements
>
>> refdata =
>> read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv")
>>
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
> :
> line 2 did not have 42 elements
>
>
> (I'd tried the first version above because the first record has column
> names.)
>
> First, I don't know why R expects 42 elements in a record.
>
Hard to tell. One guess is that you have 42 header names. Spaces inside
any of them? Is this really a CSV file? (As in Comma Separated Values).
If so, you at least need to set the sep= argument, but how about
read.csv()? or if TAB separated, read.delim().
> There is one column for a time variable (weeks since a given week of samples
> were taken) and one for each week of sampling in the data file (Week 18
> through Week 37 inclusive). And there is only 19 rows.
> The samples represented by the columns are independant, and the numbers in
> the columns are the fraction of events sampled that result in an event of
> another kind in the week since the sample was taken.
>
> The samples are not the same size, and starting with week 20, the number of
> values progressively gets smaller since there have been fewer than 37 weeks
> since the samples were taken.
>
> I can show you the contents of the data file if you wish. It is
> unremarkable, csv, with strings used for column names enclosed in double
> quotes.
>
You might well have to. One man's "unremarkable" can be remarkably
different from others'...
> I don't have to manually separate the samples into their own files do I? I
> was hoping to write a function that estimates the density function that best
> fits each sample individually, and then iterate of the columns, applying
> that function to each in turn.
>
> What is the best way to handle this?
>
> Thanks
>
> Ted
>
>
>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list