[R] how to skip certain rows when reading data

Henrik Bengtsson hb at stat.berkeley.edu
Sat Jul 29 03:14:20 CEST 2006


Have a look at readTable() in the R.utils package.  It can do quite a
few thinks like reading subsets of rows, specify colClasses by column
names etc.  Implementation was done so that memory usage is as small
as possible.  Note the note on the help page: "WARNING: This method is
very much in an alpha stage. Expect it to change.".  It should work
though.

Examples:

# Read every forth row
df <- readTable(pathname, rows=seq(from=1, to=1000, by=4));

# Read only columns 'chromosome' and 'position'.
df <- readTable(pathname, colClasses=c("chromosome"="character",
"position"="double"), defColClass="NULL", header=TRUE, sep="\t");

# Read 'log2' data chromosome by chromosome
chromosome <- readTableIndex(pathname, indexColumn=3, header=TRUE, sep="\t")
for (cc in unique(chromosome)) {
  rows <- which(chromosome == cc);
  df <- readTable(pathname, rows=rows, colClasses=c("log2"="double"),
defColClass="NULL", header=TRUE, sep="\t");
  ...
}

Cheers

Henrik

On 7/27/06, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> On Thu, 27 Jul 2006, jz7 at duke.edu wrote:
>
> > Dear all,
> >
> > I am reading the data using "read.table". However, there are a few rows I
> > want to skip. How can I do that in an easy way? Suppose I know the row
> > number that I want to skip. Thanks so much!
>
> The easy way is to read the whole data frame and using indexing (see `An
> Introduction to R') to remove the rows you do not want to retain.
> E.g. to remove rows 17 and 137
>
> mydf <- read.table(...)[-c(17, 137), ]
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list