[Rd] Bug in read.table?

jgarcia at ija.csic.es jgarcia at ija.csic.es
Sun Nov 7 22:56:28 CET 2010


Thanks. Yes, quote="" solves the problem.

I would never say, however, from the documentations, that this was causing
the duplicate records. Rather, I would have expected some kind of
warning/error message.

And, yes, I knew that, through duplicate(), R solves gracefully this
specific problem. Just thought this could be of interests for R devel.

Thanks to all,
Javier
---

The docs for read.table() direct the reader to the docs for scan()
regarding the
behavior with embedded quote chars.  The behavior of read.table() on this
data with
the default quote chars is puzzling though.


> The problem has to do with the quote characters in the data (R is probably
> interpreting the 'minutes' and 'seconds' as delimiter characters).
>
> With a smaller data file, I can reproduce the strange behavior.
>
> read.table() can read the data correctly if given quote="" to disable the
> interpretation of quote chars.
>
> Contents of tmp2.txt:
>   37.8275120694  -1.2077972583 001º12'28.07013"W 037º49'39.04345"N
>   37.8275121083  -1.2077974806 001º12'28.07093"W 037º49'39.04359"N
>   37.8275118539  -1.2077974338 001º12'28.07076"W 037º49'39.04267"N
>   37.8275119923  -1.2077974626 001º12'28.07087"W 037º49'39.04317"N
>
>
>  > read.table(file.path("tmp2.txt"), header=FALSE, as.is=TRUE)
>          V1        V2                V3                V4
> 1 37.82751 -1.207797 001º12'28.07076"W 037º49'39.04267"N
> 2 37.82751 -1.207797 001º12'28.07087"W 037º49'39.04317"N
> 3 37.82751 -1.207797 001º12'28.07013"W 037º49'39.04345"N
> 4 37.82751 -1.207797 001º12'28.07093"W 037º49'39.04359"N
> 5 37.82751 -1.207797 001º12'28.07076"W 037º49'39.04267"N
> 6 37.82751 -1.207797 001º12'28.07087"W 037º49'39.04317"N
> Warning message:
> In read.table(file.path("tmp2.txt"), header = FALSE, as.is = TRUE) :
>    incomplete final line found by readTableHeader on 'tmp2.txt'
>  > read.table(file.path("tmp2.txt"), header=FALSE, as.is=TRUE, quote="")
>          V1        V2                V3                V4
> 1 37.82751 -1.207797 001º12'28.07013"W 037º49'39.04345"N
> 2 37.82751 -1.207797 001º12'28.07093"W 037º49'39.04359"N
> 3 37.82751 -1.207797 001º12'28.07076"W 037º49'39.04267"N
> 4 37.82751 -1.207797 001º12'28.07087"W 037º49'39.04317"N
>  >
>
> The docs for read.table() direct the reader to the docs for scan()
> regarding the behavior with embedded quote chars.  The behavior of
> read.table() on this data with the default quote chars is puzzling though.
>
> -- Tony Plate
>
> On 11/5/2010 5:22 PM, jgarcia at ija.csic.es wrote:
>> Hi,
>>
>> I'm writting to this list as I'm puzzled about the behaviour of
>> read.table(). It is hard to believe that there is a bug in this utils'
>> function, but for my:
>>
>> R version 2.12.0 alpha (2010-09-28 r53056)
>>
>> I'm using scan and read.table to read a number of files, which are as:
>>
>> ---
>>
>> Project:     Murta Sonda
>> Program:     GrafNav Version 8.30.1007
>> Profile:     javier
>> Source:      GPS Epochs(Combined)
>> ProcessInfo: Run (1) by Unknown on 11/04/2010 at 19:05:17
>>
>> Datum:       WGS84, (processing datum)
>> Master 1:    Name LaMurta, Status ENABLED
>>               Antenna height 2.066 m, to L1-PC (NOV702GG, MeasDist 1.980
>> m
>> to mark/ARP)
>>               Position 37 49 38.15069, -1 12 27.55445, 368.197 m (WGS84,
>> Ellipsoidal hgt)
>> Remote:      Antenna height 1.781 m, to L1-PC (NOV702GG, MeasDist 1.695
>> m
>> to mark/ARP)
>> UTC Offset:  15 s
>> Local time:  +2.0 h, CEST [Central European Savings Time]
>> Geoid:       EGM2008-World.wpg (Absolute correction)
>>
>>        Latitude      Longitude LonTextLoTextLongitudTextL
>> LatTextLaTextLatitudeTextL        H-Ell        H-MSL LocalUTCDa
>> LocalUTC
>>           (Deg)          (Deg) (DeMi   (Sec)  (DeMi   (Sec)
>> (m)
>>         (m)      (DMY)       (HMS)
>>   37.8275120694  -1.2077972583 001º12'28.07013"W 037º49'39.04345"N
>> 368.998      318.059 25/10/2010    16:59:00
>>   37.8275121083  -1.2077974806 001º12'28.07093"W 037º49'39.04359"N
>> 368.994      318.055 25/10/2010    16:59:15
>>   37.8275118539  -1.2077974338 001º12'28.07076"W 037º49'39.04267"N
>> 368.997      318.058 25/10/2010    16:59:30
>>   37.8275119923  -1.2077974626 001º12'28.07087"W 037º49'39.04317"N
>> 368.998      318.060 25/10/2010    16:59:45
>>   37.8275323099  -1.2078075891 001º12'28.10732"W 037º49'39.11632"N
>> 368.869      317.930 25/10/2010    17:00:00
>>   37.8275323374  -1.2078077002 001º12'28.10772"W 037º49'39.11641"N
>> 368.866      317.927 25/10/2010    17:00:15
>>   37.8275325076  -1.2078075314 001º12'28.10711"W 037º49'39.11703"N
>> 368.859      317.920 25/10/2010    17:00:30
>>   37.8275325306  -1.2078075056 001º12'28.10702"W 037º49'39.11711"N
>> 368.861      317.922 25/10/2010    17:00:45
>>   37.8275323639  -1.2078075917 001º12'28.10733"W 037º49'39.11651"N
>> 368.853      317.914 25/10/2010    17:01:00
>>   37.8275326222  -1.2078076861 001º12'28.10767"W 037º49'39.11744"N
>> 368.857      317.918 25/10/2010    17:01:15
>> ---
>>
>> with a number of different records for each file.
>>
>> To read the data I'm using:
>>
>> ---
>>   dat.names<- scan(file.path("path_and_filename"),
>>                     what="character",
>>                     skip = 16, nlines=1)
>>   if(length(dat.names) != 8){
>>      stop("Input file seems to be wrong!")}
>>
>>   dat<- read.table(file.path("path_and_filename),
>>                     header=FALSE, col.names=dat.names,
>>                     skip = 18, as.is=TRUE, blank.lines.skip=FALSE)
>> ---
>> and systematically, I'm obtaining a number of repeated records at the
>> starting of the input table (6 in this example). It is easily seen by
>> looking at the field "LocalUTC":
>>
>>> dat
>>     Latitude Longitude LonTextLoTextLongitudTextL
>> LatTextLaTextLatitudeTextL   H.Ell   H.MSL LocalUTCDa LocalUTC
>> 1  37.82753 -1.207808          001º12'28.10732"W
>> 037º49'39.11632"N 368.869 317.930 25/10/2010 17:00:00
>> 2  37.82753 -1.207808          001º12'28.10772"W
>> 037º49'39.11641"N 368.866 317.927 25/10/2010 17:00:15
>> 3  37.82753 -1.207808          001º12'28.10711"W
>> 037º49'39.11703"N 368.859 317.920 25/10/2010 17:00:30
>> 4  37.82753 -1.207808          001º12'28.10702"W
>> 037º49'39.11711"N 368.861 317.922 25/10/2010 17:00:45
>> 5  37.82753 -1.207808          001º12'28.10733"W
>> 037º49'39.11651"N 368.853 317.914 25/10/2010 17:01:00
>> 6  37.82753 -1.207808          001º12'28.10767"W
>> 037º49'39.11744"N 368.857 317.918 25/10/2010 17:01:15
>> 7  37.82751 -1.207797          001º12'28.07013"W
>> 037º49'39.04345"N 368.998 318.059 25/10/2010 16:59:00
>> 8  37.82751 -1.207797          001º12'28.07093"W
>> 037º49'39.04359"N 368.994 318.055 25/10/2010 16:59:15
>> 9  37.82751 -1.207797          001º12'28.07076"W
>> 037º49'39.04267"N 368.997 318.058 25/10/2010 16:59:30
>> 10 37.82751 -1.207797          001º12'28.07087"W
>> 037º49'39.04317"N 368.998 318.060 25/10/2010 16:59:45
>> 11 37.82753 -1.207808          001º12'28.10732"W
>> 037º49'39.11632"N 368.869 317.930 25/10/2010 17:00:00
>> 12 37.82753 -1.207808          001º12'28.10772"W
>> 037º49'39.11641"N 368.866 317.927 25/10/2010 17:00:15
>> 13 37.82753 -1.207808          001º12'28.10711"W
>> 037º49'39.11703"N 368.859 317.920 25/10/2010 17:00:30
>> 14 37.82753 -1.207808          001º12'28.10702"W
>> 037º49'39.11711"N 368.861 317.922 25/10/2010 17:00:45
>> 15 37.82753 -1.207808          001º12'28.10733"W
>> 037º49'39.11651"N 368.853 317.914 25/10/2010 17:01:00
>> 16 37.82753 -1.207808          001º12'28.10767"W
>> 037º49'39.11744"N 368.857 317.918 25/10/2010 17:01:15
>>
>> Thanks,
>>
>> Javier
>> ---
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



More information about the R-devel mailing list