[R] read.table: skipping trailing delimiters

Tue May 4 18:27:30 CEST 2010

On May 4, 2010, at 11:11 AM, Marshall Feldman wrote:

> Hi,
> 
> I am trying to read a tab-delimited file that has trailing tab delimiters. It's a simple file with two legitimate fields. I'm using the first as row.names, and the second should be the only column in the resulting data frame.
> 
> Initially, R was filling the last column with NA's, but I was able to stop that by setting colClasses=c("character","character",NULL). Still, the data frame is coming in with an extra column, only now its values are set to "".
> 
> Is there any way to skip the trailing delimited field entirely? I've searched for an answer without luck.
> 
>    Thanks.
>    Marsh Feldman

The easiest way to remove a single final column is to post-process the data frame that you imported. So if your imported data frame is called 'DF':

  DF.New <- DF[, -ncol(DF)]

See ?ncol and ?Extract

You could also do more complex sub-setting using the ?subset function or consider pre-processing the file to be imported with command line tools such as cut or awk.

For example, using the 'iris' data set:

> str(iris)
'data.frame':	150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

> str(iris[, -ncol(iris)])
'data.frame':	150 obs. of  4 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...

HTH,

Marc Schwartz