[R] importing files, columns "invade next column"
Tiago R Magalhaes
tiago17 at socrates.Berkeley.EDU
Wed Jan 19 20:28:38 CET 2005
Thanks very much Mark and Prof Ripley
a) using sep='\t' when using read.table() helps somewhat
there is still a problem:I cannot get all the lines:
df <- read.table('file.txt', fill=T, header=T, sep='\t')
dim(df)
9543 195
while with the shorter file (11 cols) I get all the rows
dim(df)
15797 11
I have looked at row 9544 where the file seems to stop reading, but I
cannot see in any of the cols an obvious reason for this to happen.
Any ideas why? Maybe there is one col that is stopping the reading
process and that column is not one of the 11 that are present in the
smaller file.
b) fill=T is necessary
without fill=T, I get an error:
"line 1892 did not have 195 elements"
c) help page for read.table
I reread the help file for read.table and I would suggest to change
it. From what I think I am reading, the '\t' would not be needed to
work in my file, but it actually is:from the help page:
If 'sep = ""' (the default for 'read.table') the separator is "white
space", that is one or more spaces, tabs or newlines.
d) I incorrectly mentioned the FAQ in relation with data.restore.
Where I actually saw data.restore mentioned was in the `R Data
Import/Export Manual', which I read (even more than once...) failing
to read the first paragraph of section where it's stated that the
foreign package is used.
it works! (with source):
in Splus 6.1, windows 2000:
dump('file')
in R2.01, Mac 10.3.7:
source('file')
I get a list, where the first element is the data.frame I want
the column names have value added to them
>On Wed, 2005-01-19 at 04:25 +0000, Tiago R Magalhaes wrote:
> > Dear R-listers:
>>
>> I want to import a reasonably big file into a table. (15797 x 257
>> columns). The file is tab delimited with NA in every empty space.
>
>Tiago,
>
>Have you tried to use read.table() explicitly defining the field
>delimiting character as a tab to see if that changes anything?
>
>Try the following:
>
>AllFBImpFields <- read.table('AllFBAllFieldsNAShorter.txt',
> header = TRUE,
> row.names=paste('a',1:15797, sep=''),
> as.is = TRUE,
> sep = "\t")
>
>I added the 'sep = "\t"' argument at the end.
>
>Also, leave out the 'fill = TRUE', which can cause problems. You do not
>need this unless your source file has a varying number of fields per
>line.
>
>Note that you do not need to specify the 'nrows' argument unless you
>generally want something less than all of the rows. Using the
>combination of 'skip' and 'nrows', you can read a subset of rows from
>the middle of the input file.
>
>See if that helps. Usually when there are column alignment problems, it
>is because the rows are not being consistently parsed into fields, which
>is frequently the result of not having the proper delimiting character
>specified.
>
>The last thought is to be sure that a '#' is not in your data file. This
>is interpreted as a comment character by default, which means that
>anything after it on a row will be ignored.
>
>HTH,
>
>Marc Schwartz
More information about the R-help
mailing list