[R] importing files, columns "invade next column"
Tiago R Magalhaes
tiago17 at socrates.Berkeley.EDU
Thu Jan 20 02:12:31 CET 2005
Thanks again Marc for your help.
At this point I already have the whole file as a data.frame in R (via
Splus dump and then R source) so my problem for this specific problem
is solved.
I had changed my file in Excel and thought everything was fine but
apparently it wasn't. What program is used to display a tab file
separated in columns that doesn't corrupt the data?
I tried again from the initial file and a very simple:
x <- read.table('file.txt', header=T, sep='\t') works fine. The
sep='\t' is very important, otherwise the columns are imported in the
wrong places when there are empty spaces next to them
I would suggest again advising people to use sep='\t' for tab
delimited files in the help page for read.data.
##
If anyone is interested in a detailed history of the problem:
I had gotten my initial by exporting from Splus6.1, windows 2000 as a
tab delimited file.
I tried to open the file in R, it didn't work and I opened the file
in EXCEL and substituted the empty cells with NA. I saved the file as
txt file - tab delimited. This was the file that I could not read
only 9543 lines instead of the 15797 that the file is. The file is
probably corrupted through the use of Excel, so I guess the lesson is
don't do this in Excel.
I went back to Splus, exported a new tab delimited file and tried again:
x <- read.table('file.txt', header=T, sep='\t') #works fine
x <- read.table('file.txt', header=T) #gives an error
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 194 elements
x <- read.table('file.txt', header=T, fill=T) #wrong columns take
empty (NA) space
x <- read.table('file.txt', header=T, fill=T, sep='\t') #works fine
>On Wed, 2005-01-19 at 19:28 +0000, Tiago R Magalhaes wrote:
>> Thanks very much Mark and Prof Ripley
>>
>> a) using sep='\t' when using read.table() helps somewhat
>>
>> there is still a problem:I cannot get all the lines:
>> df <- read.table('file.txt', fill=T, header=T, sep='\t')
>> dim(df)
>> 9543 195
>>
>> while with the shorter file (11 cols) I get all the rows
>> dim(df)
>> 15797 11
>>
>> I have looked at row 9544 where the file seems to stop reading, but I
>> cannot see in any of the cols an obvious reason for this to happen.
>> Any ideas why? Maybe there is one col that is stopping the reading
>> process and that column is not one of the 11 that are present in the
>> smaller file.
>>
>> b) fill=T is necessary
>> without fill=T, I get an error:
>> "line 1892 did not have 195 elements"
>
>Tiago,
>
>How was this data file generated? Is it a raw file created by some other
>application or was it an ASCII export, perhaps from a spreadsheet or
>database program?
>
>It seems that there is something inconsistent in the large data file,
>which is either by design or perhaps the result of being corrupted by a
>poor export.
>
>It may be helpful to know how the file was generated in the effort to
>assist you.
>
>> c) help page for read.table
>> I reread the help file for read.table and I would suggest to change
>> it. From what I think I am reading, the '\t' would not be needed to
>> work in my file, but it actually is:from the help page:
>>
>> If 'sep = ""' (the default for 'read.table') the separator is "white
>> space", that is one or more spaces, tabs or newlines.
>
>Under normal circumstances, this should not be a problem, but given the
>unknowns about your file, it leaves an open question as to the etiology
>of the incorrect import.
>
>Marc
More information about the R-help
mailing list