[R] Re: large survey data
Roger Bivand
Roger.Bivand at nhh.no
Wed Jul 11 19:29:48 CEST 2001
On 11 Jul 2001, Douglas Bates wrote:
> Micha? Bojanowski <bojanr at wp.pl> writes:
>
> > Recently I came across a problem. I have to analyze a large survey
> > data - something about 600 columns and 10000 rows (tab-delimited file
> > with names in the header). I was able do import the data into an
> > object, but there is no more memory left.
> >
> > Is there a way to import the data column by column? I have to analyze
> > the whole data, but only two variables at a time.
>
> You will probably need to do the data manipulation externally.
> Two possible solutions are to use a scripting language like python or
> perl or to store the data in a relational database like PostgreSQL or
> MySQL. For data of this size I would recommend the relational
> database approach.
>
> R has packages to connect to PostgreSQL or to MySQL.
>
> If you want to use python instead the code is fairly easy to write.
> Extracting the first two fields (for which the index expression really
> is written 0:2, not 0:1 or 1:2 as one might expect), you could use
>
> #!/usr/bin/env python
>
> import string
> import fileinput
>
> for line in fileinput.input():
> flds = string.split(line, "\t")
> print string.join(flds[0:2], "\t")
Or using awk/gawk, if you prefer, to choose the fields:
> xx <- matrix(runif(5000), 100, 50)
> col <- character(ncol(xx))
> for (i in 1:ncol(xx)) col[i] <- paste("Var", i, sep="")
> colnames(xx) <- col
> write.table(as.data.frame(xx), "tryout.txt", row.names=F, sep="\t")
> cols.I.want <- c(5, 47)
> xx.I.want <- read.table(pipe(paste("awk -F\"\t\" 'BEGIN{OFS=\"\t\"}{print $",
+ cols.I.want[1], ", $", cols.I.want[2], "}' tryout.txt", sep="")),
+ header=T)
> summary(xx.I.want[,1] - xx[,cols.I.want[1]])
and pipe() to read on the fly, maybe? Generalising to an arbitrary number
of chosen columns would also be possible.
Roger
--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no
and: Department of Geography and Regional Development, University of
Gdansk, al. Mar. J. Pilsudskiego 46, PL-81 378 Gdynia, Poland.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list