[R] skip non-sequential lines using scan?

Matthew Keller mckellercran at gmail.com
Thu Nov 8 18:37:47 CET 2007


Hi all,

Thank for the advice! Gabor, I've been putting off getting into
SQLite. I may need to bite the bullet and learn it.

Jim - thanks for the help - and yes, I'd read that old post. My
problem is that, with the other objects already in memory, I cannot
pull the whole matrix in (in reality, it has 3200 rows - the 100 was
just my example).

So it looks like for now, I'll be looping... SQL down the road.

QUESTION: is there any way that the Gods of R would consider allowing
the "skip" argument in scan() to deal with vectors? Maybe it's not so
easy, but if so...

Thanks all!

Matt




On 11/8/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> Don't know if SQLite can handle that many columns but if it can and if file
> in an acceptable format then sqldf simplifies the interface to reading it
> into an SQLite database that it automatically creates on the fly and then
> gets a subset out of it into R.  (If it will fit into memory you can omit the
> dname= argument.)
>
>    library(sqldf)
>    source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R")
>
>    myfile <- file("myfile.dat")
>    sqldf("select * from myfile where rowid % 2 = 0 and rowid >= 5",
> dbname = tempfile())
>
> See example 6 on the home page:
> http://sqldf.googlecode.com
>
>
> On Nov 8, 2007 4:19 AM, Matthew Keller <mckellercran at gmail.com> wrote:
> > Hi all,
> >
> > Is there a way to skip non-sequential lines using the "skip" argument
> > in the scan function?
> >
> > E.g., I have a matrix with 100 rows and 1e7 columns. I open a
> > connection and want to read only lines 5, 7, 9, etc [i.e.,
> > seq(5,99,2)]
> >
> > It might seem that the syntax to do this would be something like this
> > (if only the "skip" allowed vectors in the same way colClasses does in
> > read.table):
> >
> > con <- file("bigfile",open="r")
> > rows.I.want <- seq(5,99,2)
> > new <- scan(con,what="character",skip=rows.I.want-1,nlines=rows.I.want)
> >
> > The above doesn't work - it would read lines 5, 6, 7, ...
> > length(seq(5,99,2)) rather than 5, 7, 9, ... 99. Yes, I know I can
> > accomplish this by looping, but with the huge datasets I'll be working
> > with, I'd like to try to save time by doing it all at once. Any ideas?
> >
> > Matt
> >
> >
> >
> > --
> > Matthew C Keller
> > Asst. Professor of Psychology
> > University of Colorado at Boulder
> > www.matthewckeller.com
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>


-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com



More information about the R-help mailing list