[R] unz( "x.zip", "y.csv" ) != pipe( "unzip -p x.zip y.csv" )
cberry@tajo.ucsd.edu
cberry at tajo.ucsd.edu
Thu Jul 24 02:08:56 CEST 2003
Not sure this is a bug in R.
Maybe its a bug in my understanding of unz().
The character 'b2' (hexadecimal) is in position 535 of line 1
of 'naughty.csv'. This character appears as superscript '2' and came to me
in an EXCEL file that I converted to text in a comma separated ( *.csv )
format.
The first line gets truncated by readLines after 534 characters using
unz():
> nchar( readLines( unz( "bad.zip", "naughty.csv" )))
[1] 534 11 9 22
> nchar(readLines( pipe(" unzip -p bad.zip naughty.csv" ) ))
[1] 809 11 9 22
attempting to read the same file using scan( unz( ... ) ) concat's the
rest of the file (including comma separators) to the word that included
'b2', while scan( pipe( "unzip ..." ) ) reads all elements.
>
> options(width = 50 ) # prevent my mailer from line wrapping
>
> nchar(scan(unz( "bad.zip", "naughty.csv") , what="a", sep=",",nlines=1)
)
Read 45 items
[1] 5 9 12 8 11 4 2 1 1 8 8
[12] 8 9 5 10 8 6 12 10 8 16 16
[23] 12 14 12 20 10 8 6 12 10 8 16
[34] 16 12 14 12 20 20 18 20 18 13 13
[45] 329
> nchar( scan( pipe(" unzip -p bad.zip naughty.csv" ) , what="a",
sep=",",nlines=1) )
Read 62 items
[1] 5 9 12 8 11 4 2 1 1 8 8 8 9 5 10
[16] 8 6 12 10 8 16 16 12 14 12 20 10 8 6 12
[31] 10 8 16 16 12 14 12 20 20 18 20 18 13 13 10
[46] 13 14 12 12 10 16 14 12 10 16 14 22 20 22 20
[61] 15 15
>
> version ## LINUX R-1.7.1 gave similar results
_
platform sparc-sun-solaris2.8
arch sparc
os solaris2.8
system sparc, solaris2.8
status
major 1
minor 7.0
year 2003
month 04
day 16
language R
>
Chuck
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://hacuna.ucsd.edu/members/ccb.html La Jolla, San Diego 92093-0717
More information about the R-help
mailing list