[R] Tab Separated File Reading Error
arun
smartpink111 at yahoo.com
Fri Oct 4 18:38:16 CEST 2013
Hi,
Try:
annoTranscripts<- read.csv("matched.txt", sep = '\t', stringsAsFactors = FALSE,quote="",header=FALSE)
str(annoTranscripts)
'data.frame': 367274 obs. of 12 variables:
$ V1 : chr "comp103529_c0_seq1" "comp129123_c0_seq1" "comp129123_c0_seq1" "comp129124_c0_seq1" ...
$ V2 : chr "XM_003723822" "XM_778057" "EU116908" "XM_786928" ...
$ V3 : chr "PREDICTED: Strongylocentrotus purpuratus neuromedin-U receptor 2-like (LOC100888633), mRNA" "PREDICTED: Strongylocentrotus purpuratus 60S ribosomal protein L30-like (LOC577852), mRNA" "Barentsia elongata putative ribosomal protein L30 mRNA, complete cds" "PREDICTED: Strongylocentrotus purpuratus 60S ribosomal protein L29-1-like (LOC587182), mRNA" ...
$ V4 : int 91 392 69 149 149 451 399 203 193 185 ...
$ V5 : int 136 479 203 209 209 541 463 451 456 472 ...
$ V6 : int 15 16 40 20 20 24 20 71 83 85 ...
$ V7 : int 0 11 4 0 0 5 1 10 4 9 ...
$ V8 : num 2e-38 0e+00 6e-26 2e-70 2e-70 ...
$ V9 : int 1 22 210 135 135 131 189 205 196 185 ...
$ V10: int 136 499 410 343 343 669 650 650 649 653 ...
$ V11: int 576 159 27 1 1 1 21 23 140 22 ...
$ V12: int 441 627 227 209 209 538 483 468 593 487 ...
dim(annoTranscripts)
[1] 367274 12
A.K.
----- Original Message -----
From: Dario Strbenac <dstr7320 at uni.sydney.edu.au>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc:
Sent: Friday, October 4, 2013 8:00 AM
Subject: [R] Tab Separated File Reading Error
Hello,
I have a seemingly simple problem that a tab-delimited file can't be read in.
> annoTranscripts <- read.table("matched.txt", sep = '\t', stringsAsFactors = FALSE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 5933 did not have 12 elements
However, all lines do have 12 columns.
> lines <- readLines("matched.txt")
> tabsPosns <- gregexpr("\t", lines)
> table(sapply(tabsPosns, length))
11
367274
> system("wc -l matched.txt")
367274 matched.txt
You can obtain the file from https://dl.dropboxusercontent.com/u/37992150/matched.txt
The line does not contain comment or quote characters. What can you suggest ?
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
loaded via a namespace (and not attached):
[1] tools_3.0.1
--------------------------------------
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list