[R] Re: large survey data

Roger Bivand Roger.Bivand at nhh.no
Wed Jul 11 19:29:48 CEST 2001


On 11 Jul 2001, Douglas Bates wrote:

> Micha? Bojanowski <bojanr at wp.pl> writes:
> 
> > Recently I came across a problem. I have to analyze a large survey 
> > data - something about 600 columns and 10000 rows (tab-delimited file 
> > with names in the header). I was able do import the data into an 
> > object, but there is no more memory left.
> > 
> > Is there a way to import the data column by column? I have to analyze 
> > the whole data, but only two variables at a time.
> 
> You will probably need to do the data manipulation externally.
> Two possible solutions are to use a scripting language like python or
> perl or to store the data in a relational database like PostgreSQL or
> MySQL.  For data of this size I would recommend the relational
> database approach.
> 
> R has packages to connect to PostgreSQL or to MySQL.
> 
> If you want to use python instead the code is fairly easy to write.
> Extracting the first two fields (for which the index expression really
> is written 0:2, not 0:1 or 1:2 as one might expect), you could use
> 
> #!/usr/bin/env python
> 
> import string
> import fileinput
> 
> for line in fileinput.input():
>     flds = string.split(line, "\t")
>     print string.join(flds[0:2], "\t")

Or using awk/gawk, if you prefer, to choose the fields:

> xx <- matrix(runif(5000), 100, 50)
> col <- character(ncol(xx))
> for (i in 1:ncol(xx)) col[i] <- paste("Var", i, sep="")
> colnames(xx) <- col
> write.table(as.data.frame(xx), "tryout.txt", row.names=F, sep="\t")
> cols.I.want <- c(5, 47)
> xx.I.want <- read.table(pipe(paste("awk -F\"\t\" 'BEGIN{OFS=\"\t\"}{print $",
+ cols.I.want[1], ", $", cols.I.want[2], "}' tryout.txt", sep="")),
+ header=T)
> summary(xx.I.want[,1] - xx[,cols.I.want[1]])

and pipe() to read on the fly, maybe? Generalising to an arbitrary number
of chosen columns would also be possible.

Roger

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no
and: Department of Geography and Regional Development, University of
Gdansk, al. Mar. J. Pilsudskiego 46, PL-81 378 Gdynia, Poland.


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list