[R] read.table truncated data?

jim holtman jholtman at gmail.com
Thu Aug 25 19:50:33 CEST 2011


When you have an unbalanced quote, it may be hard to determine exactly
where it is.  It is probably up to the user to determine with there is
truncation.  In some cases you might have data that goes over several
lines that are within quotes and is legal.  You might also read up on
the 'fill' and 'flush' parameters that take care of some other
conditions.  the 'read.table' functions assume that the data format is
well formed; if you have concerns about your data, then some
preprocessing might be in order.  You can do this with external
programs like 'perl' or with R by using readLines to read in the data
and look for potential problems.

On Thu, Aug 25, 2011 at 12:19 PM, zhenjiang xu <zhenjiang.xu at gmail.com> wrote:
> Thanks, Jim. quote='' works. And then I found a single quote in each of
> these lines:
> 3262
> 10403
> 17544
> 24685
> 31826
> 38967
> None of them near the position the table got truncated. Why is it?
> And read.table is a great function. Is it possible for it to give a warning
> message when the data gets truncated? In my case I almost looked over the
> truncation...
> On Thu, Aug 25, 2011 at 11:57 AM, jim holtman <jholtman at gmail.com> wrote:
>>
>> But did you try the following:
>>
>> x <- read.table(...., comment.char = '', quote = '')
>>
>> Most cases is that there is a missing quote somewhere in your data.
>> use a text editor and search for single and double quotes.
>>
>> On Thu, Aug 25, 2011 at 11:49 AM, zhenjiang xu <zhenjiang.xu at gmail.com>
>> wrote:
>> > Thanks for your replies. I looked at those lines and didn't spot
>> > anything
>> > unusual.
>> >
>> >> tail(a)
>> >        test_id gene_id gene               locus sample_1 sample_2 status
>> > 21418 tY(GUA)J1       - SUP7 chr10:354243-354332 air1rrp6 air2rrp6
>> > OK
>> > 21419 tY(GUA)J2       - SUP4 chr10:542955-543044 air1rrp6 air2rrp6
>> > OK
>> > 21420 tY(GUA)M1       - SUP5 chr13:168794-168883 air1rrp6 air2rrp6
>> > OK
>> > 21421 tY(GUA)M2       - SUP8 chr13:837927-838016 air1rrp6 air2rrp6
>> > OK
>> > 21422  tY(GUA)O       - SUP3 chr15:288191-288280 air1rrp6 air2rrp6
>> > OK
>> > 21423  tY(GUA)Q       -    -   chrmt:70823-70907 air1rrp6 air2rrp6
>> > OK
>> >      value_1 value_2 ln.fold_change. test_stat  p_value  q_value
>> > significant
>> > 21418 0.00000  0.0000        0.000000   0.00000 1.000000 1.011650
>> >  no
>> > 21419 0.00000  0.0000        0.000000   0.00000 1.000000 1.011480
>> >  no
>> > 21420 0.00000  0.0000        0.000000   0.00000 1.000000 1.011500
>> >  no
>> > 21421 0.00000  0.0000        0.000000   0.00000 1.000000 1.011520
>> >  no
>> > 21422 0.00000  0.0000        0.000000   0.00000 1.000000 1.011550
>> >  no
>> > 21423 6.68356 10.7397        0.474301  -1.08614 0.277417 0.455917
>> >  no
>> >
>> >
>> > tY(GUA)J1       -       SUP7    chr10:354243-354332     rrp6    air1rrp6
>> >   OK      0       0       0       0       1    1.00404  no
>> > tY(GUA)J2       -       SUP4    chr10:542955-543044     rrp6    air1rrp6
>> >   OK      0       0       0       0       1    1.00497  no
>> > tY(GUA)M1       -       SUP5    chr13:168794-168883     rrp6    air1rrp6
>> >   OK      0       0       0       0       1    1.00492  no
>> > tY(GUA)M2       -       SUP8    chr13:837927-838016     rrp6    air1rrp6
>> >   OK      0       0       0       0       1    1.00488  no
>> > tY(GUA)O        -       SUP3    chr15:288191-288280     rrp6    air1rrp6
>> >   OK      0       0       0       0       1    1.00485  no
>> > tY(GUA)Q        -       -       chrmt:70823-70907       rrp6    air1rrp6
>> >   OK      4.49644 6.68356 0.396365        -0.766052     0.443645
>> >  0.634724        no
>> > 15S_rRNA        -       15S_RRNA        chrmt:6545-8194 WT      air2rrp6
>> >   OK      2288.88 711.697 -1.16817        2.78772       0.00530801
>> >  0.0167772       yes
>> > 21S_rRNA        -       21S_RRNA        chrmt:58008-62447       WT
>> >  air2rrp6        OK      4134.59 1927.04 -0.7634 1.58991       0.111855
>> >   0.22339 no
>> > ETS1-1  -       ETS1-1  chr12:457732-458432     WT      air2rrp6
>> >  OK
>> >   3258.97 1114.76 -1.07277        2.91211 0.00359       0.0121587
>> > yes
>> > ETS1-2  -       ETS1-2  chr12:466869-467569     WT      air2rrp6
>> >  OK
>> >   3258.97 1114.76 -1.07277        2.91211 0.00359       0.0121597
>> > yes
>> >
>> >
>> > On Wed, Aug 24, 2011 at 2:34 PM, Sarah Goslee
>> > <sarah.goslee at gmail.com>wrote:
>> >
>> >> Hi,
>> >>
>> >> On Wed, Aug 24, 2011 at 2:18 PM, zhenjiang xu <zhenjiang.xu at gmail.com>
>> >> wrote:
>> >> > Hi R users,
>> >> >
>> >> > I was using read.table to read a file. The data.fame looked alright,
>> >> > but
>> >> I
>> >> > found not all rows are read by the read.table. What's wrong with it?
>> >> > It
>> >> > didn't give me any warning or error messages. Why the data are
>> >> > truncated?
>> >> > Thanks.
>> >> >
>> >> > $ wc -l all/isoform_exp.diff
>> >> > 42847 all/isoform_exp.diff
>> >> >
>> >> >> a=read.table('all/isoform_exp.diff', header=T, sep='\t')
>> >> >> nrow(a)
>> >> > [1] 21423
>> >>
>> >> This is a common problem. You need to take a look at the last row that
>> >> was imported, and the rows around 21423 in the original file.
>> >>
>> >> Common causes include stray single or double quotation marks, and
>> >> other special characters in your file like the default comment.char #
>> >>
>> >> Sarah
>> >> --
>> >> Sarah Goslee
>> >> http://www.functionaldiversity.org
>> >>
>> >
>> >
>> >
>> > --
>> > Best,
>> > Zhenjiang
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>
>
>
> --
> Best,
> Zhenjiang
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list