[Rd] read.table segfaults
Göran Broström
goran.brostrom at gmail.com
Sat Aug 27 11:27:47 CEST 2011
On Fri, Aug 26, 2011 at 11:55 PM, Ben Bolker <bbolker at gmail.com> wrote:
> Scott <ncbi2r <at> googlemail.com> writes:
>
>>
>> It does look like you've got a memory issue. perhaps using
>> as.is=TRUE, and/or stringsAsFactors=FALSE will help as optional arguments
>> to read.table
>>
>> if you don't specify these sorts of things, R can have to look through the
>> file and figure out which columns are characters/factors etc and so the
>> larger files cause more of a headache for R I'm guess. Hopefully someone
>> else can comment further on this? I'd true toggling TRUE/FALSE for as.is and
>> stringsAsFactors.
>>
>> do you have other objects loaded in memory as well? this file by itself
>> might not be the problem - but it's a cumulative issue.
>> have you checked the file structure in any other manner?
>> how large (Mb/kb) is the file that you're trying to read?
>> if you just read in parts of the file, is it okay?
>> read.table(filename,header=FALSE,sep="\t",nrows=100)
>> read.table(filename,header=FALSE,sep="\t",skip=20000,nrows=100)
>
> There seem to be two issues here:
>
> 1. what can the original poster (OP) do to work around this problem?
> (e.g. get the data into a relational data base and import it from
> there; use something from the High Performance task view such as
> ff or data.table ...)
Interestingly, the text file was created by a selection from an SQL
data base. I have access to 'db2' on an ubuntu machine, I run, at the
bash prompt,
$ db2 < file2.sql
where file2.sql contains
connect to linnedb user goran using xxxxxxxxxxx
export to '/home/goran/ALC/SQL/fil2_s.txt' of del modified by coldelX09
select linneid, fodelsear, kon, ....... from u09021.fil2
connect reset
How do I get a direct connection between R and the data base 'linnedb'?
> 2. reporting a bug -- according to the R FAQ, any low-level
> (segmentation-fault-type) crash of R when one is not messing
> around with dynamically loaded code constitutes a bug. Unfortunately,
> debugging problems like this is a huge pain in the butt.
>
> Goran, can you randomly or systematically generate an
> object of this size, write it to disk, read it back in, and
> generate the same error? In other words, does something like
>
> set.seed(1001)
> d <- data.frame(label=rep(LETTERS[1:11],1e6),
> values=matrix(rep(1.0,11*17*1e6),ncol=17)
> write.table(d,file="big.txt")
> read.table("big.txt")
>
> do the same thing?
No but I get new errors:
> ss <- read.table("big.txt")
Error in read.table("big.txt") : duplicate 'row.names' are not allowed
(there are no duplicates)
I tried to add an item to the first line and
> ss <- read.table("big.txt", header = TRUE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 10610008 did not have 19 elements
which is wrong; that line has 19 elements.
Göran
> Reducing it to this kind of reproducible example will make
> it possible for others to debug it without needing to gain
> access to your huge file ...
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Göran Broström
More information about the R-devel
mailing list