[R-sig-Debian] invalid multibyte string at '<a0>'

Anne Ghisla a.ghisla at gmail.com
Tue Aug 16 12:51:33 CEST 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 16 Aug 2011 11:53:53 +0200
Matthieu Stigler <matthieu.stigler at gmail.com> wrote:

> Hi
> 
> I have a problem reading files from Windows... these files have,
> instead of NA on last column. special ending '<a0>' which makes
> problem... This problem does not appear while reading the same file
> in Windows! Try:
> read.csv("http://dl.dropbox.com/u/6113358/prob.csv")
> 
> Could you please tell me if you also have this problem? I have tried 
> either by cleaning the file on Linux (fromdos, recode...) or setting 
> different encodings... did'nt work!

Hi Matthieu, list,
 
> Any idea? Should be obvious  feel :-(

I can reproduce the error with the command above. For curiosity I
opened it in Mousepad (a very simple text editor) and I didn't see any
sign of the special line endings. 
Opening the file with head showed question marks in correspondence of
NAs. (I use UTF-8 encoding.)

[anne at galadriel ~]$ head /tmp/prob.csv 
Location,,Time Period,Low-birth-weight newborns (%)
Afghanistan,,2004,�
Albania,,2009,�

This is the encoding information given by file:
[anne at galadriel ~]$ file /tmp/prob.csv 
/tmp/prob.csv: ISO-8859 English text, with CRLF line terminators

In Mousepad I used Replace command, selected the last whitespace of the
first line that gave the error, and replaced all of them with nothing.
I saved the file with a new name, and importing it in R gave a correct
dataframe with NAs at the right place. 

I don't know much about proper solutions to encoding issues... still I
hope this solves your issue. It can be useful to look for the problem in
the program that generates the file.

> Thanks!!
> 
> Matthieu

all the best,
Anne

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iQIcBAEBAgAGBQJOSku/AAoJEJIUallkssB4do0P/RDR2FDNBOhrrIEV/KOZXU0O
19vDTy8Sc9mkNCwDHH3zAFC1neMGlSOq9JSFVCOXHLtSxXLQ5uWXnQsm51Ochfa2
TqoZMfEI4eA2zaHJJsSAV55W+nMYrVGWrw/JpAC/M0RMaSftBc+GxkftlkAXiA5r
PopfdChrEs88helaxzXhRE7v5aiqJlLZ+JWfFRL2NmmcfoL2zqOItXoTE5nXnjKE
pGQFjXm7lGMlHTLtzEf6pblJgRr/abVfj4Qf7++9oq4LrnjXPp5pNqr0zdzpBCji
hAnLIjZLuQQZnZPT34dJkBsbFX+14bbtVVsfWGKb+jky671zLEIJzhvb07HtqqAR
YuyC6JItQwDpIj8/9gf7DyDmh8Yv4qqb9m0+96PHmTz5Y+NZWivnD8VmF+n5nyHX
m4iyGCumcppcUrlFZedivXjVlavvvn2bQCymOp9w+cNga+usFVTs0y6otEZBJKWB
jDXhMwhsZzo2O7mjk0mlv6qruhzIyPuqeXat5zWsiXGeHoPSYNfgTB1odczHK9/S
XUdGEHS18KbvzWsyVvUaM2GAM+ZB2r2F9S3qxwyvL0LTQ2OrKKTiQFZVd2NUCN9I
4O7gn6yUC1pqxiot4fhd7h49N0Fe2bSKk71pZ1OJADo/4hM3uPXewceS/hq65cU6
L2ezHUBbQPSeaA1kAvcJ
=f3Iz
-----END PGP SIGNATURE-----


More information about the R-SIG-Debian mailing list