[R] unz( "x.zip", "y.csv" ) != pipe( "unzip -p x.zip y.csv" )

cberry@tajo.ucsd.edu cberry at tajo.ucsd.edu
Thu Jul 24 02:08:56 CEST 2003


Not sure this is a bug in R. 

Maybe its a bug in my understanding of unz(). 

The character 'b2' (hexadecimal) is in position 535 of line 1 
of 'naughty.csv'. This character appears as superscript '2' and came to me
in an EXCEL file that I converted to text in a comma separated ( *.csv )
format.

The first line gets truncated by readLines after 534 characters using
unz():

> nchar( readLines( unz( "bad.zip", "naughty.csv" )))
[1] 534  11   9  22
> nchar(readLines( pipe(" unzip -p bad.zip naughty.csv" ) ))
[1] 809  11   9  22


attempting to read the same file using scan( unz( ... ) ) concat's the
rest of the file (including comma separators) to the word that included
'b2', while scan( pipe( "unzip ..." ) ) reads all elements.

>
> options(width = 50 ) # prevent my mailer from line wrapping
>
> nchar(scan(unz( "bad.zip", "naughty.csv") , what="a", sep=",",nlines=1)
)
Read 45 items
 [1]   5   9  12   8  11   4   2   1   1   8   8
[12]   8   9   5  10   8   6  12  10   8  16  16
[23]  12  14  12  20  10   8   6  12  10   8  16
[34]  16  12  14  12  20  20  18  20  18  13  13
[45] 329
> nchar( scan( pipe(" unzip -p bad.zip naughty.csv" ) , what="a",
sep=",",nlines=1) )
Read 62 items
 [1]  5  9 12  8 11  4  2  1  1  8  8  8  9  5 10
[16]  8  6 12 10  8 16 16 12 14 12 20 10  8  6 12
[31] 10  8 16 16 12 14 12 20 20 18 20 18 13 13 10
[46] 13 14 12 12 10 16 14 12 10 16 14 22 20 22 20
[61] 15 15
>
> version    ## LINUX R-1.7.1 gave similar results
         _                   
platform sparc-sun-solaris2.8
arch     sparc               
os       solaris2.8          
system   sparc, solaris2.8   
status                       
major    1                   
minor    7.0                 
year     2003                
month    04                  
day      16                  
language R                   
> 

Chuck


Charles C. Berry                        (858) 534-2098 
                                         Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://hacuna.ucsd.edu/members/ccb.html  La Jolla, San Diego 92093-0717




More information about the R-help mailing list