[R] read.delim skips first column (why?)

Tue Jul 14 11:34:22 CEST 2009

Hi

> str(read.table("test.txt", header=T))
'data.frame':   9 obs. of  12 variables:
 $ snp                      : Factor w/ 9 levels 
"rs1113188","rs1113397",..: 9 5 7 8 3 4 6 1 2
 $ gene                     : Factor w/ 1 level "TRP2": 1 1 1 1 1 1 1 1 1
 $ chromosome               : int  3 3 3 3 3 3 3 3 3

It can be sometimes tricky to upload files to R. I would recommend if 
read.delim fils try read.table which has less assumptions and try to set 
parameters (heade, sep, dec....) to get your file right

Regards
Petr

r-help-bounces at r-project.org napsal dne 14.07.2009 11:11:10:

> Hi,
> I have uploaded a copy of the file here:
> - http://pastebin.com/fd0edfab
> 
> the file has also been passed throught the unix command tool unexpand, 
but
> it doesn't solve the problem.
> 
> using head=TRUE instead of head=T has also the same effect.
> 
> the output of print(names) is:
> > print(names(ngly), quote=TRUE)
>  [1] "snp"                       "gene"
>  [3] "chromosome"                "distance_from_gene_center"
>  [5] "position"                  "ame"
>  [7] "csasia"                    "easia"
>  [9] "eur"                       "mena"
> [11] "oce"                       "ssafr"
> [13] "X"                         "X.1"
> [15] "X.2"
> 
> Thank you to all the people who answered me to my mail address, but I
> couldn't solve the problem yet.
> 
> 
> On Tue, Jul 14, 2009 at 12:36 AM, jim holtman <jholtman at gmail.com> 
wrote:
> 
> > Can you send your file as an attachment since it is impossible to see
> > where the separator characters are.
> >
> > On Mon, Jul 13, 2009 at 1:27 PM, Giovanni Marco
> > Dall'Olio<dalloliogm at gmail.com> wrote:
> > > Hi people,
> > > I have a text file like this one posted:
> > >
> > > snp_id  gene    chromosome      distance_from_gene_center
> > > position        pop1    pop2    pop3    pop4    pop5    pop6    pop7
> > > rs2129081       RAPT2   3       -129993 "upstream"      0.439009
> > > 1.169210        NA      0.233020        0.093042        NA
> > > -0.902596
> > > rs1202698       RAPT2   3       -128695 "upstream"      NA
> > > 1.815000        NA      0.399079        1.814270        1.382950
> > > NA
> > > rs1163207       RAPT2   3       -128224 "upstream"      NA      NA
> > > NA      NA      NA      NA      NA
> > > rs1834127       RAPT2   3       -128106 "upstream"      NA      NA
> > > NA      NA      NA      NA      2.180670
> > > rs2114211       RAPT2   3       -126738 "upstream"      -0.468279
> > > -1.447620       NA      0.010616        -0.414581       NA
> > > 0.550447
> > > rs2113151       RAPT2   3       -124620 "upstream"      -0.897660
> > > -1.971020       NA      -0.920327       -0.764658       NA
> > > 0.337127
> > > rs2524130       RAPT2   3       -123029 "upstream"      -0.109795
> > > -0.004646       -0.412059       1.116740        0.667567
> > > -0.924529       0.962841
> > > rs1381318       RAPT2   3       -12818  "upstream"      -0.911662
> > > -1.791580       NA      -0.945716       -1.239640       NA
> > > 0.004876
> > > rs2113319       RAPT2   3       -122028 "upstream"      -0.911662
> > > -1.738610       NA      -0.945716       -1.240950       NA -0.005318
> > >
> > > When I use read.delim (or any read function) on it, R skips the 
first
> > > column, and I don' understand why.
> > >
> > > For example:
> > > $: R
> > >> data = read.delim('snp_file.txt', head=T, sep='\t')
> > >
> > > Now, I would expect data$snp_id to contain snp ids, and data$gene to
> > contain
> > > gene names; but it is not like this:
> > >
> > >> data$snp_id
> > > [1] RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2 RAPT2
> > > Levels: RAPT2
> > >> data$gene
> > > [1] 3 3 3 3 3 3 3 3 3
> > >
> > >> summary(data)
> > >  snp_id       gene     chromosome      distance_from_gene_center
> > >  RAPT2:9   Min.   :3   Min.   :-129993   upstream:9
> > >           1st Qu.:3   1st Qu.:-128224
> > >           Median :3   Median :-126738
> > >           Mean   :3   Mean   :-113806
> > >           3rd Qu.:3   3rd Qu.:-123029
> > >           Max.   :3   Max.   : -12818
> > > ....
> > >
> > >> data$pop7
> > > [1] NA NA NA NA NA NA NA NA NA
> > >
> > >
> > > Notice that it did use snp_id as the header for the first column, 
but it
> > > skips completely al the data from that column, and all the fields 
are
> > > shifted, so the last column is filled with NA values.
> > >
> > > What I am doing wrong? Can it be a problem of my data files? I have 
tried
> > to
> > > modify them a bit (add new columns, etc..) but it didn't work.
> > >
> > > I am running R from an Ubuntu system:
> > >> sessionInfo()
> > > R version 2.9.1 (2009-06-26)
> > > i486-pc-linux-gnu
> > >
> > > locale:
> > >
> > 
> 
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
> > >
> > > attached base packages:
> > > [1] stats     graphics  grDevices utils     datasets  methods   base
> > >
> > >
> > >
> > >
> > > --
> > > Giovanni Dall'Olio, phd student
> > > Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)
> > >
> > > My blog on bioinformatics: http://bioinfoblog.it
> > >
> > >        [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> >
> > --
> > Jim Holtman
> > Cincinnati, OH
> > +1 513 646 9390
> >
> > What is the problem that you are trying to solve?
> >
> 
> 
> 
> -- 
> Giovanni Dall'Olio, phd student
> Department of Biologia Evolutiva at CEXS-UPF (Barcelona, Spain)
> 
> My blog on bioinformatics: http://bioinfoblog.it
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.