[R] Variable length datafile import problem

Ingo Reinhold ingor at kth.se
Fri Feb 18 09:16:49 CET 2011


Hi John, 

seems there is no easy way. I'll just precondition it with AWK as described here http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg53401.html

There are some remarks in the thread that R is not supposed to read too large files for "political" reasons. Maybe that's it.

Many thanks again for the effort. 

Ingo
________________________________________
From: John Kane [jrkrideau at yahoo.ca]
Sent: Thursday, February 17, 2011 11:54 AM
To: Ingo Reinhold
Subject: RE: [R] Variable length datafile import problem

Generally most of the gurus are in this list.  Hopefully someone will take an interest in the problem.

I suspect that there may be some kind of weird value in the file that is upsetting in import.  Given the results I got when I removed the data past BD and then at AL it seems that the problem might be within this range.

You could try removing half the data between those columns and see what happens, then repeat if something turns up. It's tedious but unless someone with a better grasp of variable length data import can help it's the best I can suggest.

BTW you only replied to me.  You should make sure to cc the list otherwise readers won't realise that I am being of no help.

If you still have the problem by Saturday e-mail me or post to the list and I'll try to spent some more time messing about with the problem.

Sorry to be of so little help.
--- On Thu, 2/17/11, Ingo Reinhold <ingor at kth.se> wrote:

> From: Ingo Reinhold <ingor at kth.se>
> Subject: RE: [R] Variable length datafile import problem
> To: "John Kane" <jrkrideau at yahoo.ca>
> Received: Thursday, February 17, 2011, 5:36 AM
> Hi John,
>
> as it seems we're hitting the wall here, can you maybe
> recommend another mailing list with "gurus" (as you put it)
> that may be able to help?
>
> Regards,
>
> Ingo
> ________________________________________
> From: John Kane [jrkrideau at yahoo.ca]
> Sent: Thursday, February 17, 2011 11:25 AM
> To: Ingo Reinhold
> Subject: RE: [R] Variable length datafile import problem
>
> Hi Ingo,
>
> I've had a bit of time to examine the file and I must say
> that, at the moment, I have no idea what is going on.
> I tried the old cut the file into pieces trick just came up
> with even more anomalous results.
>
> My first attempt remove all the data past column AL in an
> OOo Calc spreadsheet.  This created a rectangular
> dataset It imported into R with no problem with 38 columns
> as expected.
>
> Then I deleted all the data from the orignal data file
> (test.dat) removing all the data past column BD in an OOo
> Calc spreadsheet.
>
> This imported a file with only 38 columns.
>
> Something very funny is happening but at the moment I have
> no
>
> --- On Wed, 2/16/11, Ingo Reinhold <ingor at kth.se>
> wrote:
>
> > From: Ingo Reinhold <ingor at kth.se>
> > Subject: RE: [R] Variable length datafile import
> problem
> > To: "John Kane" <jrkrideau at yahoo.ca>
> > Received: Wednesday, February 16, 2011, 1:59 AM
> > Hi John,
> >
> > V1 should be just a character. However I figured
> something
> > out myself. The import looks OK in terms of column
> when
> > adding the flush=TRUE option.
> >
> > I am still very confused about the dimensions that
> the
> > imported data shows. Loading my data file into
> something
> > like OOspreadsheet shows me a maximum of about 245,
> which
> > does not correspond to the 146 generated by R. Any
> idea
> > where this saturation comes from?
> >
> > Thanks,
> >
> > Ingo
> > ________________________________________
> > From: John Kane [jrkrideau at yahoo.ca]
> > Sent: Wednesday, February 16, 2011 1:57 AM
> > To: Ingo Reinhold
> > Subject: RE: [R] Variable length datafile import
> problem
> >
> > Is rawData$V1 intended to be factor or character?
> >
> > str(rawData) gives
> > $ V1  : Factor w/ 54 levels "-232.0","-234.0",..:
> 41
> > 41 41 41 41 41 41 41 41 41 ...
> >
> > If you were not expecting a factor you might try
> > options(stringsAsFactors = FALSE) before importing
> the
> > data.
> >
> > --- On Tue, 2/15/11, Ingo Reinhold <ingor at kth.se>
> > wrote:
> >
> > > From: Ingo Reinhold <ingor at kth.se>
> > > Subject: RE: [R] Variable length datafile import
> > problem
> > > To: "John Kane" <jrkrideau at yahoo.ca>
> > > Received: Tuesday, February 15, 2011, 3:35 PM
> > > Dear all,
> > >
> > > I have changed the file-ending with no change in
> the
> > > result. I don't think that this should matter.
> > >
> > > http://dl.dropbox.com/u/2414056/Test.dat
> > > is a test file which represent the structure I
> am
> > trying to
> > > read. So far I have used
> > >
> > > rawData=read.table("Test.txt", fill=TRUE,
> sep="\t",
> > > header=FALSE);
> > >
> > > When then looking at rawData$V1 this gives me a
> > distorted
> > > view of my original first column.
> > >
> > > Thanks,
> > >
> > > Ingo
> >
> >
> >
>
>
>
>





More information about the R-help mailing list