[R] importing files, columns "invade next column"

Wed Jan 19 20:28:38 CET 2005

Thanks very much Mark and Prof Ripley

a) using sep='\t' when using read.table() helps somewhat

there is still a problem:I cannot get all the lines:
df <- read.table('file.txt', fill=T, header=T, sep='\t')
dim(df)
  9543  195

while with the shorter file (11 cols) I get all the rows
dim(df)
  15797    11

I have looked at row 9544 where the file seems to stop reading, but I 
cannot see in any of the cols an obvious reason for this to happen. 
Any ideas why? Maybe there is one col that is stopping the reading 
process and that column is not one of the 11 that are present in the 
smaller file.

b) fill=T is necessary
without fill=T, I get an error:
"line 1892 did not have 195 elements"

c) help page for read.table
I reread the help file for read.table and I would suggest to change 
it. From what I think I am reading, the '\t' would not be needed to 
work in my file, but it actually is:from the help page:

  If 'sep = ""' (the default for 'read.table') the separator is "white 
space", that is one or more spaces, tabs or newlines.

d) I incorrectly mentioned the FAQ in relation with data.restore. 
Where I actually saw data.restore mentioned was in the `R Data 
Import/Export Manual', which I read (even more than once...) failing 
to read the first paragraph of section where it's stated that the 
foreign package is used.

it works! (with source):
in Splus 6.1, windows 2000:
dump('file')
in R2.01, Mac 10.3.7:
source('file')

I get a list, where the first element is the data.frame I want
the column names have value added to them

>On Wed, 2005-01-19 at 04:25 +0000, Tiago R Magalhaes wrote:
>  > Dear R-listers:
>>
>>  I want to import a reasonably big file into a table. (15797 x 257
>>  columns). The file is tab delimited with NA in every empty space.
>
>Tiago,
>
>Have you tried to use read.table() explicitly defining the field
>delimiting character as a tab to see if that changes anything?
>
>Try the following:
>
>AllFBImpFields <- read.table('AllFBAllFieldsNAShorter.txt',
>                               header = TRUE,
>                               row.names=paste('a',1:15797, sep=''),
>                               as.is = TRUE,
>                               sep = "\t")
>
>I added the 'sep = "\t"' argument at the end.
>
>Also, leave out the 'fill = TRUE', which can cause problems. You do not
>need this unless your source file has a varying number of fields per
>line.
>
>Note that you do not need to specify the 'nrows' argument unless you
>generally want something less than all of the rows. Using the
>combination of 'skip' and 'nrows', you can read a subset of rows from
>the middle of the input file.
>
>See if that helps. Usually when there are column alignment problems, it
>is because the rows are not being consistently parsed into fields, which
>is frequently the result of not having the proper delimiting character
>specified.
>
>The last thought is to be sure that a '#' is not in your data file. This
>is interpreted as a comment character by default, which means that
>anything after it on a row will be ignored.
>
>HTH,
>
>Marc Schwartz