[R] Strange column shifting with read.table

Sun Aug 2 23:35:07 CEST 2009

On 02-Aug-09 21:10:12, Noah Silverman wrote:
> Hi,
> I am reading in a dataframe from a CSV file. It has 70 columns.
> I do not have any kind of unique "row id".
> 
> rawdata <- read.table("r_work/train_data.csv", header=T, sep=",", 
>                       na.strings=0)
> 
> When training an svm, I keep getting an error
> So, as an experiment, I wrote the data back out to a new file
> so that I could see what the svm function sees.
> 
> write.table(rawdata, file="r_work/output_data.csv",
>             quote=FALSE, sep=",")
> 
> It appears as if R has added a column for me with id numbers
> for each row.  That would be fine, except that R SHIFTS ALL MY
> COLUMN LABELS OVER ONE.  That causes several problems:
>      1) The header names are now wrong for each column
>      2) My last column has no label
>      3) The SVM complains about the unlabeled column
> 
> Would someone please help me sort this out.
> Thanks!
> -N

Not that the default for "row.names" in write.table() is TRUE.
So. in your caoomand, that is what you get. write.table() then
*creates* row-names (by default the row numbers). Compare:

  D <- rbind(c(1.1,1.2,1.3),c(2.1,2.2,2.3),c(3.1,3.2,3.3))
  D
  #      [,1] [,2] [,3]
  # [1,]  1.1  1.2  1.3
  # [2,]  2.1  2.2  2.3
  # [3,]  3.1  3.2  3.3

  write.table(D,file="withTRUE.csv",quote=FALSE,sep=",")
  # withTRUE.csv:
  # V1,V2,V3
  # 1,1.1,1.2,1.3
  # 2,2.1,2.2,2.3
  # 3,3.1,3.2,3.3

  write.table(D,file="withFALSE.csv",row.names=FALSE,quote=FALSE,sep=",")
  # withFALSE.csv:
  # V1,V2,V3
  # 1.1,1.2,1.3
  # 2.1,2.2,2.3
  # 3.1,3.2,3.3

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 02-Aug-09                                       Time: 22:35:04
------------------------------ XFMail ------------------------------