[R] importing files, columns "invade next column"
Tiago R Magalhaes
tiago17 at socrates.Berkeley.EDU
Wed Jan 19 05:25:00 CET 2005
Dear R-listers:
I want to import a reasonably big file into a table. (15797 x 257
columns). The file is tab delimited with NA in every empty space. I
have reproduced what I have used as my read.table instruction. I have
read the R-dataImportExport FAQ and still couldn't solve my problem.
(I might have missed it, of course). I'm using R.2.01 in a Mac G4,
10.3.7.
I can import the file, but one of the columns "invades the other",
meaning that the if there is an empty space marked as NA on the first
column, it gets the value of the second column. I tried to import
four different files (details below) and I think the problem is with
the number of columns (with less columns it works)
workarounds:
a) I can separate my file into several files, import them and then
make one file in R
b) try to learn basic commands in awk? perl?
any advice on this?
another question (much less important) I have a binnary file in Splus
for this object. I exported the object in Splus as it says in the FAQ
(dump.data). But data.restore doesn't exist as a function. Is it
because I'm using a Mac?
details of what I did:
##
a) importing a shorter version of my file (58 columns); I get the
"invading" behaviour and a column of row.names that I don't
understand where it comes from. (UNIQID should be empty and 1006
should be in All.FB.Id
> AllFBImpFields <- read.table('AllFBAllFieldsNAShorter.txt', fill=T, header=T,
+ row.names=paste('a',1:15797, sep=''),
+ as.is=T, nrows=15797)
> AllFBImpFields[1:2,1:5]
row.names UNIQID All.FB.Id All.FB.5 All.FB.4
a1 <NA> 10006 <NA> <NA> <NA>
a2 <NA> 10007 <NA> <NA> <NA>
##
b) Importing only 5 cols of the previous file. It works. there is no
"invasion" and the col row.names is not inserted
> AllFB5Cols <- read.table('AllFB5Cols.txt', fill=T, header=T,
+ row.names=paste('a',1:15797, sep=''),
+ as.is=T, nrows=15797)
> AllFB5Cols[1:2,1:5]
UNIQID All.FB.Id Symbol FB.gn CG.name
a1 <NA> 10006 p53 FBgn0039044 CG10873
a2 <NA> 10007 Gr94a FBgn0041225 CG31280
##
c) importing file with 4 rows, 58 columns; invasion behaviour and a
warning that I don't get in a) although the file is the same for the
first 4 rows
> x4rowsAllCol <- read.table('AllFB4rowsAllCols.txt', fill=T, header=T,
+ row.names=paste('a',1:4, sep=''),
+ as.is=T, nrows=4)
Warning message:
incomplete final line found by readTableHeader on `AllFB4rowsAllCols.txt'
> x4rowsAllCol[1:2,1:5]
row.names UNIQID All.FB.Id All.FB.5 All.FB.4
a1 NA 10006 NA NA NA
a2 NA 10007 NA NA NA
##
d) importing file with 4 rows and 4 cols, result is like b) but gives
the same warning as c!)
> x4rows5cols <- read.table('AllFB4rows5cols.txt', fill=T, header=T,
+ row.names=paste('a',1:4, sep=''),
+ as.is=T, nrows=4)
Warning message:
incomplete final line found by readTableHeader on `AllFB4rows5cols.txt'
> x4rows5cols[1:2,1:5]
UNIQID All.FB.Id All.FB.5 All.FB.4 All.FB.3
a1 NA 10006 NA NA NA
a2 NA 10007 NA NA NA
More information about the R-help
mailing list